1#if defined HAVE_CONFIG_H 2#include "config.h" 3#endif 4 5!!@LICENSE 6! 7!****************************************************************************** 8! MODULE mesh3D 9! Defines and handles different parallel distributions of a 3D mesh of points. 10! Written by J.M.Soler. Jan-July.2008 11!****************************************************************************** 12! 13! PUBLIC procedures available from this module: 14! addMeshData : Adds the data in a box array to the equiv. unit-cell points 15! associateMeshTask : Associates a communication task to mesh distr. 16! copyMeshData : Copies data in a unit-cell array to another mesh box array 17! fftMeshDistr : Sets four mesh distributions to perform FFTs 18! freeMeshDistr : Frees a distribution index (they are limited to maxDistr) 19! freeMeshTask : Frees a task index (they are limited to maxTasks) 20! myMeshBox : Returns mesh box of my processor node 21! nodeMeshBox : Returns mesh box of any given processor node 22! redistributeMeshData : Copies data between different mesh distributions 23! sameMeshDistr : Finds if two mesh distributions are equal 24! setMeshDistr : Defines a distribution of mesh points over parallel nodes 25! 26! PUBLIC parameters, types, and variables available from this module: 27! none 28! 29!****************************************************************************** 30! 31! USED module procedures: 32! use alloc, only: de_alloc ! De-allocation routine 33! use alloc, only: re_alloc ! Re-allocation routine 34! use sys, only: die ! Termination routine 35! use sorting, only: ordix ! Finds the sorting index of a vector 36! use m_array, only: array_copy ! Copies flattened arrays (instead of reshaping) 37! use m_array, only: array_add ! Adds flattened arrays (instead of reshaping) 38! 39! USED module parameters: 40! use precision, only: dp ! Real double precision type 41! use precision, only: gp=>grid_p ! Grid-data real precision type 42! use parallel, only: myNode=>node ! This process node 43! use parallel, only: totNodes=>nodes ! Total number of processor nodes 44! 45! EXTERNAL procedures used: 46! none 47! 48! Private fixed module parameters (see/set values in code): 49! integer maxDistr : Max. mesh distributions allocated at any given time 50! integer maxTasks : Max. communication tasks allocated at any given time 51! integer maxDistrID : Max. IDs assigned to the same distrib. 52! integer maxTaskID : Max. IDs assigned to the same task 53! integer maxParts : Max. parts of a mesh box, in different unit cells 54! integer maxDistrTasks : Max. commun. tasks associated to a mesh distribution 55! 56!****************************************************************************** 57! 58! SUBROUTINE addMeshData( nMesh, srcBox, srcData, dstDistr, dstData, task ) 59! 60! Adds (reduces) the values defined in a 3D mesh box to the equivalent mesh 61! points in a periodic unit cell. The input box may be entirely inside, or 62! partially or totally outside the the unit cell, and it is folded to add 63! all its values to the corresponding unit-cell points. 64! In parallel execution, the routine handles the interprocessor transfers 65! needed to send the box data to the nodes that will store them. 66!------------------------------ INPUT ----------------------------------------- 67! integer nMesh(3) : Global mesh size in each axis 68! integer srcBox(2,3) : Source mesh box: box(1,iAxis)=lower box limits 69! box(2,iAxis)=upper box limits 70! The box may be partially or totally outside the first 71! unit cell (FUC). It is folded to the FUC in order to 72! assign the values of srcData to those of dstData. 73! real(gp) srcData(0:,0:,0:,:) : Source data values to be copied. 74! The first three upper array bounds must be equal or 75! larger than srcBox(2,iAxis)-src(1,iAxis). Real type gp 76! is identical to grid_p, defined in module precision. 77! integer dstDistr : Distrib. ID of destination data, or zero for serial 78! execution 79!----------------------- INPUT and OUTPUT ------------------------------------- 80! real(gp) dstData(0:,0:,0:,:) : Destination data array. Its upper array bounds 81! must be enough to contain the node box in the dstDistr 82! parallel distrib or nMesh(iAxis)-1 in serial execution 83! It is NOT initialized: its input values are incremented 84! with the source array values. 85!-------------------- OPTIONAL INPUT and OUTPUT ------------------------------- 86! integer task : ID of communication task. It allows to save information on an 87! optimized interprocessor communication pattern, required to 88! exchange data between source and destination mesh boxes. 89! This info is used in future calls with the same srcBox and 90! dstDistr. task value must be saved between calls. See also 91! associateMeshTask. 92!----------------------------- UNITS ------------------------------------------ 93! Units of srcData and dstData are arbitrary but they must be equal 94!----------------------------- USAGE ------------------------------------------ 95! Typical usage for parallel execution is: 96! use precision, only: gp=>grid_p 97! use mesh3D, only: addMeshData, associateMeshTask, myMeshBox, setMeshDistr 98! integer:: box1(2,3), box2(2,3), myBox(2,3), nData, nMesh(3) 99! integer,save:: myDistr=-1, task1=-1, task2=-1 100! real(gp),allocatable:: myData(:,:,:,:), data1(:,:,:,:), data2(:,:,:,:) 101! ... Find/define nMesh 102! call setMeshDistr( myDistr, nMesh ) 103! call myMeshBox( nMesh, myDistr, myBox ) 104! allocate( myData(myBox(1,1):myBox(2,1), & 105! myBox(1,2):myBox(2,2), & 106! myBox(1,3):myBox(2,3),1:nData) ) 107! myData = 0 108! ... Find/define box1 and box2 109! allocate( data1(box1(1,1):box1(2,1), & 110! box1(1,2):box1(2,2), & 111! box1(1,3):box1(2,3),1:nData) ) 112! allocate( data2(box2(1,1):box2(2,1), & 113! box2(1,2):box2(2,2), & 114! box2(1,3):box2(2,3),1:nData) ) 115! ... Find data1 and data2 116! call associateMeshTask( task1, myDistr ) 117! call associateMeshTask( task2, myDistr ) 118! call addMeshData( nMesh, box1, data1, myDistr, myData, task1 ) 119! call addMeshData( nMesh, box2, data2, myDistr, myData, task2 ) 120! This usage is also valid in serial execution, although it could be 121! simplified in this case (e.g. the call to setMeshDistr could be replaced by 122! myDistr=0). 123! The optimization of the communication task is done at the end of the first 124! call with given srcBox and dstDistr. Therefore, the use of the task argument 125! is not advised unless the same srcBox and dstDistr are used repeatedly. 126! 127! In using some mesh3D routines, it is essential to understand that, in 128! fortran90, assumed-shape array arguments (like srcData and dstData in 129! addMeshData) are passed with their shape (i.e. their dimension in each 130! index) but NOT with the lower or upper bounds of these indexes. Thus, 131! element a(6,1) of an array dimensioned as a(5:10,2) in the calling program, 132! and as a(0:,0:) in the called subroutine, will be accessed as a(1,0) in the 133! latter. This is why arrays dimensioned as srcData(0:,0:,0:,:) may be (and 134! are expected to be) dimensioned in the calling program as 135! srcData(box(1,1):box(2,1),box(1,2):box(2,2),box(1,3):box(2,3),nData) 136!---------------------------- BEHAVIOUR --------------------------------------- 137! If dstDistr==0, it assumes serial execution, i.e. that dstData contains the 138! destination data values at all the mesh points in the unit-cell. 139! It stops with an error message in any of these cases: 140! - size(srcData,1:3) < srcBox(2,1:3)-srcBox(1,1:3)+1 141! - size(dstData,1:3) < dstBox(2,1:3)-dstBox(1,1:3)+1, with dstBox as 142! returned by call myMeshBox(nMesh,dstDistr,dstBox) 143! - size(srcData,4) /= size(dstData,4) 144! If task is present, it is automatically associated to dstDistr (see 145! associateMeshDistr). If dstDistr changes between calls (see freeMeshDistr), 146! task info is automatically erased. 147! If task is present and it contains a predefined value (returned by a previous 148! call to addMeshData or copyMeshData), but srcBox has changed, it stops with 149! an error message (since the communication task has also changed). To avoid 150! this, reset task to a negative value. If srcBox derives from a defined 151! mesh distribution, and task had been associated to it, task info will be 152! automatically deleted when the distrib changes, and no error will occur. 153!--------------------------- ALGORITHMS --------------------------------------- 154! The same as in copyMeshData, except that the destination boxes, rather than 155! the source boxes, are previously obtained from dstDistr. 156! 157!****************************************************************************** 158! 159! SUBROUTINE associateMeshTask( taskID, distrID1, distrID2 ) 160! 161! Associates a communication task to one or two parallel mesh distributions 162!------------------------------ INPUT ----------------------------------------- 163! integer distrID1 : ID of first distribution to be associated with task 164!-------------------------- OPTIONAL INPUT ------------------------------------ 165! integer distrID2 : ID of second distrribution to be associated 166!----------------------- INPUT and OUTPUT ------------------------------------- 167! integer taskID : ID of task to be associated with distribution(s) 168!----------------------------- USAGE ------------------------------------------ 169! use mesh3D, only: associateMeshTask, setMeshDistr 170! integer:: nMesh(3) 171! integer,save:: distr1=-1, distr2=-1, task1=-1, task2=-1, task12=-1, task21=-1 172! real(gp),allocatable:: workload(:,:,:) 173! ... Find nMesh. Allocate and find workload 174! call setMeshDistr( distr1, nMesh ) 175! call setMeshDistr( distr2, nMesh, workload ) 176! call associateMeshTask( task1, distr1 ) 177! call associateMeshTask( task2, distr2 ) 178! call associateMeshTask( task12, distr1, distr2 ) 179! call associateMeshTask( task21, distr1, distr2 ) 180!---------------------------- BEHAVIOUR --------------------------------------- 181! In serial execution (totNodes==1) it does nothing 182! If taskID is already associated with distrID1 (and distrID2, if present), 183! nothing is done. Thus, it is safe (and not inefficient) to make the same 184! call repeatedly. 185! If the taskID value has not been defined nor associated with any distribution, 186! it is initialized before associating it, and it returns changed. 187! If taskID is defined, but it is associated with two distributions different 188! from distrID1 (or from distrID1 and distrID2, if the latter is present), 189! the taskID is erased (freed) and reinitialized. 190! It stops with an error message in the following cases: 191! - If distrID1 (or distrID2 if present) has not been defined or if it has 192! been erased (freed). 193! - If the available maxDistrTasks slots, for tasks associated to distrID1 194! (or distrID2, if present), is overflooded. 195!--------------------------- ALGORITHMS --------------------------------------- 196! The taskID is an identifier for a parallel communication "task", defined as 197! a set of source and destination mesh boxes that belong to each processor. 198! The same task can be identified by different IDs (for example by different 199! calling programs, but the information is stored only once. This is ensured 200! by comparing the boxes of a new taskID with those of all previously defined 201! tasks. If they coincide, the taskID is simply assigned to that task. 202! When a taskID is erased (freed), the task itself is not erased, unless all 203! its IDs have been erased. There may be up to maxTasks possible simultaneous 204! tasks, each with up to maxTaskID IDs. 205! A task may be associated with up to two mesh distributions. This association 206! state is stored in both the task and the distribution(s) info. The 207! association means that the task's source or destination boxes are linked 208! to the distribution. This does not necessarily mean that the boxes are the 209! node's boxes in the distribution. For example, they might be "wings" to 210! the left or to the right of those boses. The key point is that the task 211! boxes will change when the distribution changes. 212! The purpose of the task-distribution association is to erase automatically 213! the task when the distribution is erased (freed). This is convenient 214! (and safer) because it allows not to have to free the task explicitly. 215! 216!****************************************************************************** 217! 218! SUBROUTINE copyMeshData( nMesh, srcDistr, srcData, dstBox, dstData, task ) 219! 220! Copies a box of values defined in a 3D mesh of a periodic unit cell. 221! The box limits may be entirely inside, or partially or totally outside 222! the unit cell, which is periodically repeated to fill the box. 223! In parallel execution, the routine handles the interprocessor transfers 224! needed to bring the requested box data from the nodes that store them. 225!------------------------------ INPUT ----------------------------------------- 226! integer nMesh(3) : Global mesh size in each axis 227! integer srcDistr : Distrib. ID of source data, or zero for serial exec. 228! real(gp) srcData(0:,0:,0:,:) : Source data values to be copied. Its upper 229! array bounds must be enough to contain the node box in 230! the srcDistr parallel distrib, or nMesh(iAxis)-1 in 231! serial execution. Real type gp is identical to grid_p, 232! defined in module precision. 233! integer dstBox(2,3) : Destination mesh box: box(1,iAxis)=lower box limits 234! box(2,iAxis)=upper box limits 235! The box may be partially or totally outside the first 236! unit cell (FUC). It is folded to the FUC in order to 237! assign the values of srcData to those of dstData. 238!----------------------------- OUTPUT ----------------------------------------- 239! real(gp) dstData(0:,0:,0:,:) : Destination data array (different from srcData) 240! The first three upper array bounds must be equal or 241! larger than dstBox(2,iAxis)-dst(1,iAxis). 242!-------------------- OPTIONAL INPUT and OUTPUT ------------------------------- 243! integer task : ID of communication task. It allows to save information on an 244! optimized interprocessor communication pattern, required to 245! exchange data between source and destination mesh boxes. 246! This info is used in future calls with the same srcBox and 247! dstDistr. task value must be saved between calls. See also 248! associateMeshTask. 249!----------------------------- UNITS ------------------------------------------ 250! Units of srcData and dstData are arbitrary but they must be equal 251!----------------------------- USAGE ------------------------------------------ 252! A typical usage for data redistribution in parallel execution is: 253! use precision, only: gp=>grid_p 254! use mesh3D, only: associateMeshTask, copyMeshData, myMeshBox, setMeshDistr 255! integer:: box1(2,3), box2(2,3), nData, nMesh(3) 256! integer,save:: distr1=-1, distr2=-1, task=-1 257! real(gp),allocatable:: data1(:,:,:,:), data2(:,:,:,:) 258! ... Find/define nMesh, box1 and box2 (so that they are two partitions of mesh) 259! call setMeshDistr( distr1, nMesh, box1 ) 260! call setMeshDistr( distr2, nMesh, box2 ) 261! allocate( data1(box1(1,1):box1(2,1), & 262! box1(1,2):box1(2,2), & 263! box1(1,3):box1(2,3),1:nData) ) 264! allocate( data2(box2(1,1):box2(2,1), & 265! box2(1,2):box2(2,2), & 266! box2(1,3):box2(2,3),1:nData) ) 267! ... Find data1 in box1 268! call associateMeshTask( task, distr1, distr2 ) 269! call copyMeshData( nMesh, distr1, data1, box2, data2, task ) 270! This usage is still valid in serial execution, although it makes little 271! sense in this case, since both mesh distributions would be identical. 272! The optimization of the communication task is done at the end of the first 273! call with given srcDistr and dstBox. Therefore, the use of the task argument 274! is not advised unless the same srcDistr and dstBox are used repeatedly. 275!---------------------------- BEHAVIOUR --------------------------------------- 276! If srcDistr==0, it assumes serial execution, i.e. that srcData contains the 277! source data values at all the mesh points in the unit-cell. 278! It stops with an error message in any of these cases: 279! - size(srcData,1:3) < srcBox(2,1:3)-srcBox(1,1:3)+1, with srcBox as 280! returned by call myMeshBox(nMesh,srcDistr,srcBox) 281! - size(dstData,1:3) < dstBox(2,1:3)-dstBox(1,1:3)+1 282! - size(srcData,4) /= size(dstData,4) 283! If task is present, it is automatically associated to srcDistr (see 284! associateMeshDistr). If srcDistr changes between calls (see freeMeshDistr), 285! task info is automatically erased. 286! If task is present and it contains a predefined value (returned by a previous 287! call to addMeshData or copyMeshData), but dstBox has changed, it stops with 288! an error message (since the communication task has also changed). To avoid 289! this, reset task to a negative value. If dstBox derives from a defined 290! mesh distribution (as in the example above), and task had been associated 291! to it, task info will be automatically deleted when the distrib changes, 292! and no error will occur. 293!--------------------------- ALGORITHMS --------------------------------------- 294! - The source boxes are first obtained from srcDistr. 295! - All the nodes' source and destination boxes are gathered, redistributed, 296! and stored in each other node. These boxes are also identified by a taskID, 297! which is returned, for retrieval in future calls with the same task 298! (defined by all the source and destination boxes themselves). 299! - A universal (dependent only on totNodes) ordered sequence of all-to-all 300! inter-node communication transfers is found. It is designed to minimize 301! communication collisions. It is based on timing together transfers 302! between processors separated by a given index span. 303! - In the first call, the true, nonempty transfers for a given task are 304! selected. Then, the order of the true transfers is (will be) reoptimized 305! based on a graph-theory algorithm. This optimized sequence is then stored 306! as part of the task information. 307! - For each transfer (true or temptative): 308! - The source box (my node's in sending transfers, or that of the sending 309! node, in receiving transfers) and the destination box (mine when 310! receiving, or the receiving node's when sending) are each divided in 311! nParts, each in a different periodic unit cell. These part boxes are 312! then shifted to the first unit cell (FUC). 313! - For each pair of the partial source and destination boxes, the 314! intersection between their FUC images is found. If the intersection is 315! empty (only happens if task info is not available), nothing is done. 316! - When sending, the corresponding source data values are added to a buffer, 317! that is finally sent with all parts together. When receiving, the buffer 318! is splitted in the corresponding intersection parts and added to the 319! destination data array. 320! 321!****************************************************************************** 322! 323! SUBROUTINE fftMeshDistr( nMesh, fftDistr, axisDistr ) 324! 325! Creates and returns a homogeneaous 3D mesh distribution fftDistr, and three 326! 2D distributions axisDistr, whose boxes extend wholly over one of the three 327! cell axes. These axisDistr allow fully local 1D FFTs in one of the axes. 328! The fftDistr and axisDistr are designed to make already optimal (without any 329! collisions) the default sequence of inter-node communications, returned by 330! all2allTransferOrder. This is beacause all simultaneous communications occur 331! between nodes separated by the same span (difference between node indexes). 332! Thus, no task arguments are required to optimize such communications. 333!------------------------------ INPUT ----------------------------------------- 334! integer nMesh(3) : Mesh divisions in each axis (including siesta "subpoints") 335!------------------------- INPUT and OUTPUT ----------------------------------- 336! integer fftDistr : ID of parallel 3D mesh distribution to be used in 3D FFTs 337!--------------------- OPTIONAL INPUT and OUTPUT ------------------------------ 338! integer axisDistr(3) : 2D distributions for the 1D FFTs required in 3D FFTs 339!----------------------------- USAGE ------------------------------------------ 340! Typical usage to find my box of k points: 341! use mesh3D, only: fftMeshDistr, myMeshBox 342! integer:: kBox(2,3), nMesh(3) 343! integer,save:: fftDistr=-1 344! ...Find nMesh 345! call fftMeshDistr( nMesh, fftDistr ) 346! call myMeshBox( nMesh, fftDistr, kBox ) 347!---------------------------- BEHAVIOUR --------------------------------------- 348! In serial execution (totNodes==1) it simply returns fftDistr=0. 349! If the input value of fftDistr (and axisDistr, if present) are already 350! defined for that nMesh, it returns without changing them. In this case, it 351! does NOT check that the distributions are suited for FFTs. 352! Since it calls setMeshDistr, some of the behaviour of that routine is 353! inherited by fftMeshDistr. 354!--------------------------- ALGORITHMS --------------------------------------- 355! First, it uses setMeshDistr, with default input parameters, to create an 356! optimized uniform 3D distr fftDistr. 357! Then, if axisDistr is present: for the x axis, it divides each row of node 358! boxes along that axis (say nNodeX boxes) uniformly in the other two axes. 359! Then it assigns one of these new long boxes (that extend along the whole x 360! axis) to each of the nNodeX processors, that owned the primitive 3D boxes. 361! This new boxes define axisDistr(1). The same is done for the y and z axes 362! (stricly, the axes are the cell axes, not x, y, and z). 363! 364!****************************************************************************** 365! 366! SUBROUTINE freeMeshDistr( distrID ) 367! 368! Erases a mesh distribution ID and frees the distribution slot if 369! no other IDs are assigned to it 370! 371!------------------------------ INPUT ----------------------------------------- 372! integer distrID ! ID of an allocated distribution 373!----------------------------- USAGE ------------------------------------------ 374! use mesh3D, only: free MeshDistr, setMeshDistr 375! integer:: nMesh(3) 376! integer,save:: myDistr=-1 377! ...Find nMesh 378! call setMeshDistr( nMesh, myDistr ) 379! ...Use myDistr 380! call freeMeshDistr( myDistr ) 381!---------------------------- BEHAVIOUR --------------------------------------- 382! If distrID is not defined, it returns without doing anything 383! If the distribution has only that distrID, its information is erased. 384! Otherwise, only that particular ID is removed from it. 385!--------------------------- ALGORITHMS --------------------------------------- 386! See setMeshDistr 387! 388!****************************************************************************** 389! 390! SUBROUTINE myMeshBox( nMesh, distrID, box ) 391 392! Finds the mesh box of the local processor in a parallel mesh distribution. 393! Equivalent to nodeMeshBox with node=myNode 394!------------------------------ INPUT ----------------------------------------- 395! integer nMesh(3) : Mesh divisions in each axis 396! integer distrID : Mesh-distribution ID 397!----------------------------- OUTPUT ----------------------------------------- 398! integer box(2,3) : Mesh box: box(1,:)=lower bounds, box(2,:)=upper bounds 399!----------------------------- USAGE ------------------------------------------ 400! Same as nodeMeshBox but without need of using myNode 401!---------------------------- BEHAVIOUR --------------------------------------- 402! Same as nodeMeshBox 403!--------------------------- ALGORITHMS --------------------------------------- 404! It simply calls nodeMeshBox with node=myNode 405! 406!****************************************************************************** 407! 408! SUBROUTINE nodeMeshBox( nMesh, distrID, node, box ) 409! 410! Finds the mesh box belonging to any node in a parallel mesh distribution, 411!------------------------------ INPUT ----------------------------------------- 412! integer nMesh(3) : Mesh divisions in each axis 413! integer distrID : Mesh-distribution ID 414!----------------------------- OUTPUT ----------------------------------------- 415! integer box(2,3) : Mesh box: box(1,:)=lower bounds, box(2,:)=upper bounds 416!----------------------------- USAGE ------------------------------------------ 417! Typical usage to find my box of mesh points: 418! use mesh3D, only: nodeMeshBox 419! use parallel, only: myNode=>node 420! integer:: myBox(2,3), nMesh(3) 421! integer,save:: myDistr=-1 422! ...Find nMesh and myDistr 423! call nodeMeshBox( nMesh, myDistr, myNode, myBox ) 424!---------------------------- BEHAVIOUR --------------------------------------- 425! If iDistr=0, it returns box(1,:)=0 and box(2,:)=nMesh(:)-1 426! Note: although nMesh is available from the distr. data, it is also included 427! in the routine arguments, to handle directly the case iDistr=0. 428! If node stores no mesh points, its (empty) box returns with box(1,:)>box(2,:) 429!--------------------------- ALGORITHMS --------------------------------------- 430! The corresponding box is simply retrived from the ditrib. data stored. 431! 432!****************************************************************************** 433! 434! SUBROUTINE redistributeMeshData( srcDistr, srcData, dstDistr, dstData, task ) 435! 436! Copies data values on the mesh with a new mesh distribution among nodes 437!------------------------------ INPUT ----------------------------------------- 438! integer srcDistr : Distrib. ID of srcData 439! real(gp) srcData(:,:,:,:) : Source data. Must be declared target or pointer. 440! integer dstDistr : Distrib. ID of dstData 441!------------------------- OUTPUT POINTER ------------------------------------- 442! real(gp) dstData(:,:,:,:) : Output destination array 443! : (it must be different from srcData) 444!--------------------- OPTIONAL INPUT and OUTPUT ------------------------------ 445! integer, task : ID of communication task 446!----------------------------- UNITS ------------------------------------------ 447! Units of srcData and dstData are arbitrary but they must be equal 448!----------------------------- USAGE ------------------------------------------ 449! Typical usage to 'translate' between two distributions: 450! use precision, only: gp=>grid_p 451! use mesh3D, only: associateMeshTask, redistributeMeshData 452! integer:: nData, newBox(2,3), nMesh(3) 453! integer,save:: newDistr=-1, old2new=-1, oldDistr=-1 454! real(gp),pointer:: newData(:,:,:,:), oldData(:,:,:,:) 455! ...Find nMesh, nData, oldDistr. Allocate and find oldData 456! call setMeshBox( newDistr, nMesh ) 457! call myMeshBox( nMesh, newDistr, newBox ) 458! allocate( newData( newBox(1,1):newBox(2,1), & 459! newBox(1,2):newBox(2,2), & 460! newBox(1,3):newBox(2,3), nData ) 461! call associateMeshTask( old2new, oldDistr, newDistr ) 462! call redistributeMeshData( oldDistr, oldData, newDistr, newData, old2new ) 463! 464! In serial execution, srcData and dstData will be the same physical array 465! upon return. This allows to write a common serial-parallel code, without 466! increasing the required memory in the serial case. But this must be handled 467! with much care, to avoid erasing or overwritting dstData inadvertently. 468! Therefore, the use of copyMeshData is prefferable in most other situations. 469!---------------------------- BEHAVIOUR --------------------------------------- 470! If srcDistr==0 and dstDistr==0 (serial execution) it simply makes dstData 471! point to srcData. This allows to save memory, but it must be handled with 472! care in calling program. 473! If the two distributions are equal (sameMeshDistr(srcDistr,dstDistr)==.true.) 474! it simply (re)allocates dstData and copies dstData=srcData. Othewise, 475! it calls copyMeshData after the (re)allocation of dstData. 476!--------------------------- ALGORITHMS --------------------------------------- 477! First, the dstDistr mesh box is obtained, then dstData is (re)allocated and 478! finally, copyMeshData is used to obtain dstData from srcData 479! 480!****************************************************************************** 481! 482! logical FUNCTION sameMeshDistr( ID1, ID2 ) 483! 484! Compares two distribution IDs 485!------------------------------ INPUT ----------------------------------------- 486! integer ID1, ID2 ! Mesh distribution IDs to be compared 487!----------------------------- USAGE ------------------------------------------ 488! use mesh3D, only: sameMeshDistr 489! integer,save:: oldDistr=-1, newDistr=-1 490! ... Find oldDistr and newDistr 491! if (.not.sameMeshDistr(oldDistr,newDistr)) do something 492!---------------------------- BEHAVIOUR --------------------------------------- 493! In serial execution (ID1=ID2=0), it returns .true. 494! If ID1 and ID2 are equal, but not defined distributions, it returns .false. 495!--------------------------- ALGORITHMS --------------------------------------- 496! In parallel execution, it looks for the distributions that own the given IDs, 497! then it checks whether all their mesh boxes are the same. 498! 499!****************************************************************************** 500! 501! SUBROUTINE setMeshDistr( distrID, nMesh, box, firstNode, nNodes, & 502! nNodesX, nNodesY, nNodesZ, nBlock, & 503! wlDistr, workload ) 504! 505! Defines a parallel distribution of mesh points over processor nodes. 506!------------------------------ INPUT ----------------------------------------- 507! integer nMesh(3) : Mesh divisions in each axis (including siesta "subpoints") 508!------------------------- OPTIONAL INPUT ------------------------------------- 509! integer box(2,3) : Mesh box of my processor node: 510! box(1,iAxis)=lower box limits, in range (0:nMesh(iAxis)-1) 511! box(2,iAxis)=upper box limits, in same range 512! integer firstNode : First node in the mesh distr. 513! integer nNodes : Total nodes in the mesh distr 514! integer nNodesX : Nodes in the X (first) axis. 515! Must be present if present(nNodesYZ) 516! integer nNodesY : Nodes in the Y (second) axis. 517! Must be present if present(nNodesZ) 518! integer nNodesZ : Nodes in the Z (third) axis 519! integer nBlock : Size of blocks of mesh points, in each axis, which are 520! not splitted in different nodes. It must be a factor of 521! all of nMesh(1:3). If box is also present, nBlock must be 522! a factor of all box(1,1:3) and box(2,1:3)+1. 523! nBlock corresponds to nsm (lateral size of "superpoints") 524! in siesta mesh terminology. 525! integer wlDistr : Distr. index of workload array 526! real(gp) workload(0:,0:,0:) ! Approx. relative workload of mesh points. 527! Must be nonnegative at all points and have nonzero sum. 528!-------------------------- INPUT and OUTPUT ---------------------------------- 529! integer distrID : ID assigned to the mesh distrib. 530!----------------------------- USAGE ------------------------------------------ 531! Arguments box, firstNode, nNodes, nNodesXYZ, and nBlock are provided to 532! force compatibility with user distributions, but they should be avoided 533! otherwise, since they difficult the optimal distribution of mesh points. 534! 535! Typical usage to create a uniform 3D mesh distribution: 536! use mesh3D, only: setMeshDistr 537! integer:: nMesh(3) 538! integer,save:: myDistr=-1 539! ... Find nMesh 540! call setMeshDistr( myDistr, nMesh ) 541! 542! Typical usage to distribute evenly the workload of each node: 543! use mesh3D, only: setMeshDistr 544! integer:: nMesh(3) 545! integer,save:: newDistr=-1, oldDistr=-1 546! real(gp),allocatable:: workload(:,:,:) 547! ... Find nMesh 548! call setMeshDistr( oldDistr, nMesh ) 549! call myMeshBox( oldDistr, nMesh, box ) 550! allocate( workload(box(1,1):box(2,1), & 551! box(1,2):box(2,2), & 552! box(1,3):box(2,3)) ) 553! ... Find approximate CPU workload associated to each point of box 554! call setMeshDistr( newDistr, nMesh, wlDistr=oldDistr, workload=workload ) 555! 556! The mesh distribution of J.D.Gale, used up to siesta-2.5, is set by: 557! use parallel, only: ProcessorY 558! use mesh, only: nsm 559! use mesh3D, only: setMeshDistr 560! integer:: nMesh(3) 561! integer,save:: JDGdistr=-1 562! call setMeshDistr( JDGdistr, nMesh, nNodesX=1, nNodesY=ProcessorY, 563! nBlock=nsm ) 564!---------------------------- BEHAVIOUR --------------------------------------- 565! In serial execution (totNodes==1) it simply returns distrID=0, irrespective 566! of all arguments. Parameter totNodes is obtained from module parallel. 567! If the input distribution ID is still valid (i.e. consistent with the other 568! input arguments), the same value is returned. If it points to an existing 569! distribution that is no longer consistent, the old distribution ID is 570! freed before returning with a new distrID. This makes it convenient to 571! make succesive calls with the same distrID but different other arguments 572! (e.g. different workloads in different iterations). 573! New IDs are never repeated, even if they identify the same distribution. 574! If box is present, all other optional arguments are ignored. The different 575! node mesh boxes should be a nonoverlapping partition of all the mesh 576! points, but this is NOT checked. 577! If firstNode is not present, firstNode=0 is assumed. 578! If nNodes is not present all nodes are used (nNodes=totNodes). 579! It stops with an error message in the following cases: 580! - nNodes is present and it is smaller than 1 or larger than totNodes. 581! - nNodesZ is present but either nNodesX or nNodesY are not present. 582! - nNodesY is present but nNodesX is not present. 583! - nBlock is present and it is not a factor of one of nMesh(1:3). 584! - box and nBlock are present, and nBlock is not a factor of any of 585! box(1,1:3) or box(2,1:3)+1. 586! - workload is present but wlDistr is not present. 587! - workload is present and negative at any mesh point, or zero at all 588! points of the unit cell mesh. 589! If nNodesXYZ are not present, the distribution of nodes over each axis is 590! optimized, in the sense of leading to node mesh boxes as cubic as possible. 591! If only nNodesX is present, it is optimized over the Y and Z axes. 592! If nBlock is not present, nBlock=1 is assumed. 593! If workload is present, its distribution over processors is optimized, in the 594! sense of load balance. If it is not present, the even distribution of mesh 595! points is optimized (equivalent to using the same workload for all points). 596!--------------------------- ALGORITHMS --------------------------------------- 597! For uniform distributions (workload not present): 598! - nNodes is first fatorized in its prime factors. 599! - All possible distributions of (products of) factors over axes are tried, 600! and that with lowest dispersion of nMesh(axis)/factor(axis) is selected. 601! This is done only over axes not constrained by nNodesXYZ, if present, and 602! respecting that axis factors must be multiples of nBlock, if present. 603! - nMesh(axis) is divided in nNodes(axis) boxes. If there is a rest, the first 604! rest(axis) of the nNodes(axis) are given an extra point. Again, this is 605! done with blocks of nBlock points (if present) rather than single points. 606! 607! For nonuniform distributions (workload present): 608! - nNodes is first fatorized in its prime factors. 609! - Beginning with a box of the entire cell size, each box is divided in 610! nParts=factor (in order of decreasing factors). The division axis is that 611! along which the spatial dispersion of the box workload is maximum. The 612! division surfaces are chosen to split the workload as evenly as possible. 613! If the division surface is in a void region (with zero workload), it is 614! placed at the midpoint between the nonzero workload regions. 615! - The divided boxes are assigned to a contiguous set of 616! boxNodes=boxNodes/nParts. This process is repeated until boxNodes=1. 617! 618! A stored distribution is defined by nMesh and its node boxes. There may be up 619! to maxDistr distributions defined simultaneously. Each distribution may be 620! identified simultaneously by up to maxDistrID identifiers (for example by 621! different calling routines), but its information is stored only once. 622! This is ensured by comparing the boxes of a new distrID with those of all 623! previously defined distributions. If they coincide, the distrID is simply 624! added to the existing distribution. 625! Each distrID is never repeated. When a distrID is erased (freed), the 626! distribution itself is not erased, unless all its IDs have been erased. 627! This prevents that a calling routine may erase a distribution that is 628! still being used by other routines. 629! 630!****************************************************************************** 631 632MODULE mesh3D 633 634! Used module procedures 635 use alloc, only: de_alloc ! De-allocation routine 636 use alloc, only: re_alloc ! Re-allocation routine 637 use sys, only: die ! Termination routine 638 use sorting, only: ordix ! Sorting routine 639 640! Used module parameters 641 use precision, only: dp ! Real double precision type 642 use precision, only: gp=>grid_p ! Grid-data real precision type 643! use parallel, only: myNode=>node ! This process node 644! use parallel, only: totNodes=>nodes ! Total number of processor nodes 645#ifdef DEBUG_XC 646 use debugXC, only: udebug ! Output file unit for debug info 647#endif /* DEBUG_XC */ 648 649 ! Copy and add for limiting temporary arrays 650 use m_array, only: array_copy, array_add 651 652! Used MPI procedures and types 653#ifdef HAVE_MPI 654 use mpi 655 use precision, only: MPI_grid_real 656#endif 657 658 ! MPI variables 659#ifdef HAVE_MPI 660 use gridxc_config, only: comm => gridxc_comm 661#endif 662 use gridxc_config, only: myNode => gridxc_myNode 663 use gridxc_config, only: totNodes => gridxc_totNodes 664 665 implicit none 666 667! Public procedures 668PUBLIC:: & 669 addMeshData, &! Adds the data in a box array to the equiv. unit-cell points 670 associateMeshTask, &! Associates a communication task to mesh distr. 671 copyMeshData, &! Copies data in a unit-cell array to another mesh box array 672 fftMeshDistr, &! Sets four mesh distributions to perform FFTs 673 freeMeshDistr, &! Frees a distribution index (they are limited to maxDistr) 674 freeMeshTask, &! Frees a task index (they are limited to maxTasks) 675 myMeshBox, &! Returns mesh box of my processor node 676 nodeMeshBox, &! Returns mesh box of any given processor node 677 redistributeMeshData, &! Copies data between different mesh distributions 678 sameMeshDistr, &! Finds if two mesh distributions are equal 679 setMeshDistr ! Defines a distribution of mesh points over parallel nodes 680 681! Public types, parameters, or variables: 682! none 683 684PRIVATE ! Nothing is declared public beyond this point 685 686 ! Private module parameters 687 character(len=*),parameter:: moduleName = 'mesh3D ' 688 integer,parameter:: maxDistr = 20 ! Max. number of mesh distributions 689 ! allocated at any given time 690 integer,parameter:: maxTasks = 100 ! Max. number of communication tasks 691 ! allocated at any given time 692 integer,parameter:: maxDistrID = 20 ! Max. IDs assigned to the same distrib. 693 integer,parameter:: maxTaskID = 10 ! Max. IDs assigned to the same task 694 integer,parameter:: maxParts = 125 ! Max. parts of a mesh box 695 integer,parameter:: maxDistrTasks = 50 ! Max. communication tasks associated 696 ! to a parallel mesh distribution 697 698 ! Private type to hold mesh distribution data 699 type distrType 700 private 701 logical:: defined=.false. ! Has this distribution been defined? 702 integer:: ID(maxDistrID)=-1 ! ID numbers assigned to the distribution 703 integer:: nNodes=-1 ! Number of nodes participating in the distrib. 704 integer:: firstNode=-1 ! First of the consecutive nNodes in the distr. 705 integer:: nMesh(3)=-1 ! Number of total mesh divisions in each axis 706 integer:: task(maxDistrTasks)=-1 ! Communic. tasks associated to distr. 707 integer,pointer:: box(:,:,:)=>null() ! Mesh box bounds of each node: 708 ! box(1,iAxis,iNode)=lower bounds 709 ! box(2,iAxis,iNode)=upper bounds 710 end type distrType 711 712 ! Private type to hold a mesh communication task 713 type taskType 714 private 715 logical:: defined=.false. ! Has this task been defined? 716 logical:: optimized=.false. ! Has the transfer order been optimized? 717 logical:: associated=.false. ! Has this task been associated to any distr? 718 integer:: ID(maxTaskID)=-1 ! ID numbers assigned to the distribution 719 integer:: distr(2)=-1 ! Distr. indexes to which task is associated 720 integer:: nTrsf=0 ! Number of needed communication transfers 721 integer,pointer:: srcBox(:,:,:)=>null() ! Source mesh box of each node. 722 integer,pointer:: dstBox(:,:,:)=>null() ! Destination mesh box of each node. 723 integer,pointer:: trsfNode(:) ! Sequence of to/from transfer nodes 724 integer,pointer:: trsfDir(:) ! Sequence of to/from transfer directions 725 end type taskType 726 727 ! Private array that will hold the different distributions data 728 type(distrType),target,save:: storedMeshDistr(maxDistr) 729 730 ! Private array that will hold the different communication tasks 731 type(taskType),target,save:: storedMeshTask(maxTasks) 732 733 ! Private counters 734 integer,save:: nDistrID = 0 ! Number of assigned distribution IDs 735 integer,save:: nTaskID = 0 ! Number of assigned mesh communication task IDs 736 737 738#ifdef HAVE_MPI 739 integer:: MPIerror, MPIstatus(MPI_STATUS_SIZE) 740#endif 741 742CONTAINS 743 744!---------------------------------------------------------------------------- 745 746subroutine addMeshData( nMesh, srcBox, srcData, dstDistr, dstData, task ) 747 748! Adds (reduces) the values defined in a 3D mesh box to the equivalent mesh 749! points in a periodic unit cell. 750 751 implicit none 752 integer, intent(in) :: nMesh(3) ! Global mesh size in each axis 753 integer, intent(in) :: srcBox(2,3) ! Source mesh box: 754 ! box(1,iAxis)=lower box limits 755 ! box(2,iAxis)=upper box limits 756 ! Box may be larger than the 757 ! (0:nMesh-1) unit-cell range 758 real(gp),intent(in) :: srcData(0:,0:,0:,:) ! Source data values to be copied. 759 ! The first three upper array bounds 760 ! must be equal or larger than 761 ! srcBox(2,iAxis)-src(1,iAxis) 762 integer, intent(in) :: dstDistr ! Distrib. ID of destination data 763 ! or zero for serial execution 764 real(gp),intent(inout):: dstData(0:,0:,0:,:) ! Destination data array 765 ! Its upper array bounds must be 766 ! enough to contain the node box in 767 ! the dstDistr parallel distrib. 768 ! or nMesh(iAxis)-1 in serial exe 769 integer,intent(inout),optional:: task ! ID of communication task 770 771 integer :: dstBox(2,3) 772 773! Associate task to distribution 774 if (present(task)) call associateMeshTask( task, dstDistr ) 775 776! Find box limits of destination data in this node 777 call myMeshBox( nMesh, dstDistr, dstBox ) 778 779! Add srcData to dstData 780 call reduceData( nMesh, srcBox, srcData, dstBox, dstData, taskID=task ) 781 782end subroutine addMeshData 783 784!------------------------------------------------------------------------------- 785 786subroutine all2allTransferOrder( nNodes, node, nTrsf, trsfNode, trsfDir ) 787 788! Finds an ordered sequence of matching all-to-all send-receive transfers 789! that avoids collisions and that approximately minimizes total transfer time 790! Algorithm: 791! At any given 'time' (index of the ordered sequence), half of the nodes send 792! right-wards (to higher nodes) and the other half receives from leftwards, 793! with a fixed internode span, modulo nNodes2 (defined below). The sending/ 794! receiving nodes belong to odd/even blocks (or viceversa) of consecutive nodes, 795! whose blockSize is the power-of-two factor of the span (i.e. span is the 796! product of the blocSize times an odd number). nNodes2 is the smallest 797! multiple of 2*blockSize equal to, or larger than nNodes. This ensures opposite 798! parities of the blocks of sendNode and recvNode=modulo(sendNode+span,nNodes2). 799 800 implicit none 801 integer,intent(in) :: nNodes ! Number of parallel processor nodes. The 802 ! number of transfers per node is 2*nNodes-1 803 integer,intent(in) :: node ! Local processor node, in range (0:nNodes-1) 804 integer,intent(out):: nTrsf ! Number of required transfers 805 integer,intent(out):: trsfNode(0:2*(nNodes-1)) ! To/from transfer node. 806 ! First 'transfer' is local: trsfNode(0)=node 807 integer,intent(out):: trsfDir(0:2*(nNodes-1)) ! Transfer direction: 808 ! trsfDir=+1 => from node to trsfNode 809 ! trsfDir=-1 => from trsfNode to node 810 ! trsfDir=0 => local 'transfer' within node 811 812 integer:: blockSize, iTrsf, nNodes2, nodePar, oddFac, pow2, span, trsfPar 813 814 ! Set num. of transfers needed (a send and a receive to/from each other node) 815 nTrsf = 2*(nNodes-1) 816 817 ! Special local transfer 818 trsfNode(0) = node 819 trsfDir(0) = 0 820 if (nNodes==1) return 821 822 ! Loop on block sizes (powers of 2). iTrsf is the transfer-order index. 823 iTrsf = 1 824 do pow2 = 0,int(log(real(nNodes))/log(2.))+1 825 blockSize = 2**pow2 826 ! Find smallest multiple of 2*blockSize equal to, or larger than nNodes 827 nNodes2 = ((nNodes-1)/(2*blockSize)+1) * (2*blockSize) 828 ! Find parity of node's block 829 nodePar = mod(node/blockSize,2) 830 ! Loop on odd factors of span. Maximum span is nNodes2-1 831 ! Fake nodes are 'added' between nNodes and nNodes2-1 832 do oddFac = 1,nNodes2/blockSize,2 833 span = blockSize * oddFac 834 ! Loop on transfer 'parity' (direction) 835 do trsfPar = 0,1 836 ! Find transfer sense, and the node sending/receiving to/from node 837 ! All transfers are rightwards, with transfer node modulo mNodes2 838 if (nodePar==trsfPar) then ! node sends 839 trsfNode(iTrsf) = modulo( node+span, nNodes2 ) 840 trsfDir(iTrsf) = +1 841 else ! node receives 842 trsfNode(iTrsf) = modulo( node-span, nNodes2 ) 843 trsfDir(iTrsf) = -1 844 end if ! (nodePar==trsfPar) 845 ! If trsfNode is within range (not a fake node), accept this transfer 846 if (trsfNode(iTrsf)<nNodes) iTrsf = iTrsf + 1 847 ! Finish when all transfers have been found 848 if (iTrsf>2*(nNodes-1)) return 849 end do ! trsfPar 850 end do ! oddFact 851 end do ! pow2 852 853end subroutine all2allTransferOrder 854 855!------------------------------------------------------------------------------- 856 857subroutine associateMeshTask( taskID, distrID1, distrID2 ) 858 859! Associates a communication task to one or two parallel mesh distributions 860 861 implicit none 862 integer, intent(inout):: taskID ! ID of task to be associated 863 integer, intent(in) :: distrID1 ! ID of first distr, to be assoc. 864 integer,optional,intent(in) :: distrID2 ! ID of second distr. to be assoc. 865 866 character(len=*),parameter:: myName = 'associateMeshTask ' 867 character(len=*),parameter:: errHead = myName//'ERROR: ' 868 logical:: found 869 integer:: iDistr1, iDistr2, it, iTask 870 type(taskType), pointer :: task 871 type(distrType), pointer :: distr1, distr2 872 873! In serial execution, do nothing 874 if (totNodes<2) return 875 876! Find distribution(s) 877 iDistr1 = indexDistr( distrID1 ) 878 if (iDistr1<0) call die(errHead//'distrID1 not defined') 879 distr1 => storedMeshDistr(iDistr1) ! Just a shorter name 880 if (present(distrID2)) then 881 iDistr2 = indexDistr( distrID2 ) 882 if (iDistr2<0) call die(errHead//'distrID2 not defined') 883 distr2 => storedMeshDistr(iDistr2) 884 end if ! (present(distrID2)) 885 886! Find task 887 found = .false. 888 iTask = indexTask( taskID ) 889 if (iTask>0) then ! Task exists already 890 task => storedMeshTask(iTask) ! Just a shorter name 891 found = .true. 892 if (.not.any(iDistr1==task%distr)) found = .false. 893 if (present(distrID2)) then 894 if (.not.any(iDistr2==task%distr)) found = .false. 895 end if 896 if (.not.found) & ! Free existing task ID, since it has new associations 897 call freeMeshTask( taskID ) 898 end if ! (iTask>0) 899 900! Create new task, if necessary 901 if (found) then 902 return ! Since task is alredy associated to same distribution(s) 903 else ! Task did not exist or has been eliminated => create it 904 call initTask( taskID ) 905 iTask = indexTask( taskID ) 906 task => storedMeshTask(iTask) 907 end if ! (found) 908 909! Store association to first distribution 910 task%distr(1) = iDistr1 ! Associate distribution to task 911 if (.not.any(distr1%task==iTask)) then ! Associate task to distribution 912 found = .false. 913 do it = 1,maxDistrTasks 914 if (distr1%task(it)<0) then ! Empty slot 915 distr1%task(it) = iTask 916 found = .true. 917 exit ! it loop 918 end if 919 end do ! it 920 if (.not.found) call die(errHead//'parameter maxDistrTasks too small') 921 end if ! (.not.any(distr1%task==iTask)) 922 923! Store association to second distribution, if different from first one 924 if (present(distrID2) .and. iDistr1/=iDistr2) then 925 task%distr(2) = iDistr2 926 if (.not.any(distr2%task==iTask)) then 927 found = .false. 928 do it = 1,maxDistrTasks 929 if (distr2%task(it)<0) then ! Empty slot 930 distr2%task(it) = iTask 931 found = .true. 932 exit ! it loop 933 end if 934 end do ! it 935 if (.not.found) call die(errHead//'parameter maxDistrTasks too small') 936 end if ! (.not.any(distr2%task==iTask)) 937 end if ! (present(distrID2)) 938 939! Mark task as already associated 940 task%associated = .true. 941 942#ifdef DEBUG_XC 943! if (present(distrID2)) then 944! write(udebug,'(a,2i4,2(2x,2i4))') & 945! myName//'taskID,iTask,distrID,iDistr=', & 946! taskID, iTask, distrID1, iDistr1, distrID2, iDistr2 947! write(udebug,'(a,i3,2x,25i3,/,(36x,25i3))') myName//'iDistr,tasks=', & 948! iDistr1, pack(distr1%task,distr1%task>0) 949! write(udebug,'(a,i3,2x,25i3,/,(36x,25i3))') myName//'iDistr,tasks=', & 950! iDistr2, pack(distr2%task,distr2%task>0) 951! else 952! write(udebug,'(a,2i4,2x,2i4)') & 953! myName//'taskID,iTask,distrID,iDistr=', & 954! taskID, iTask, distrID1, iDistr1 955! write(udebug,'(a,i3,2x,25i3,/,(36x,25i3))') myName//'iDistr,tasks=', & 956! iDistr1, pack(distr1%task,distr1%task>0) 957! end if 958#endif /* DEBUG_XC */ 959 960end subroutine associateMeshTask 961 962!------------------------------------------------------------------------------- 963 964subroutine commonBox( nMesh, aBox, bBox, aComBox, bComBox, sizeSum, nParts ) 965 966! Finds the common intersection(s) of two mesh boxes, not necessarily contained 967! in first unit cell (FUC). Each box is folded in parts contained in FUC. 968! Then the intersection is obtained between each part or the folded aBox and 969! each part of the folded bBox and returned in nParts intersection parts. 970 971 implicit none 972 integer,intent(in) :: nMesh(3) ! Mesh divisions in each axis 973 integer,intent(in) :: aBox(2,3) ! First box, not necessarily within FUC 974 integer,intent(in) :: bBox(2,3) ! Second box, not necessarily within FUC 975 integer,intent(out):: aComBox(:,:,:) ! Partial intersection boxes, each one 976 ! contained in a single periodic cell, 977 ! relative to the aBox origin 978 integer,intent(out):: bComBox(:,:,:) ! Same partial intersection boxes, but 979 ! relative to the bBox origin 980 integer,intent(out):: sizeSum(0:) ! Sum of the sizes of the partial boxes 981 integer,intent(out):: nParts ! Number of partial boxes 982 983 character(len=*),parameter:: myName = moduleName//'commonBox ' 984 character(len=*),parameter:: errHead = myName//'ERROR: ' 985 integer:: aPart, aParts, aPartBox(2,3,maxParts), aPartBoxFUC(2,3,maxParts), & 986 bPart, bParts, bPartBox(2,3,maxParts), bPartBoxFUC(2,3,maxParts), & 987 comBox(2,3), comSize, mxParts 988 989! Find maximum number of parts allowed by size of output arrays 990 mxParts = min( size(aComBox,3), size(bComBox,3), size(sizeSum)-1 ) 991 992! Divide aBox and bBox in parts contained in a single unit cell, as well as 993! the equivalent part boxes in first unit cell (FUC) 994 call unitCellParts( nMesh, aBox, aPartBox, aPartBoxFUC, aParts ) 995 call unitCellParts( nMesh, bBox, bPartBox, bPartBoxFUC, bParts ) 996 997! Find intersection boxes common to aBox and bBox 998#ifdef DEBUG_XC 999! write(udebug,'(a,3(2i4,2x))') myName//' aBox=', aBox 1000! write(udebug,'(a,3(2i4,2x))') myName//' bBox=', bBox 1001#endif /* DEBUG_XC */ 1002 nParts = 0 1003 sizeSum(0) = 0 1004 do bPart = 1,bParts 1005 do aPart = 1,aParts 1006 comBox(1,:) = max( aPartBoxFUC(1,:,aPart), bPartBoxFUC(1,:,bPart) ) 1007 comBox(2,:) = min( aPartBoxFUC(2,:,aPart), bPartBoxFUC(2,:,bPart) ) 1008 if (all(comBox(1,:)<=comBox(2,:))) then ! Nonempty intersection box 1009 nParts = nParts + 1 1010 if (nParts > mxParts) & 1011 call die(errHead//'size of aComBox, bComBox, or sizeSum too small') 1012 ! Sum size of common box 1013 comSize = product(comBox(2,:)-comBox(1,:)+1) 1014 sizeSum(nParts) = sizeSum(nParts-1) + comSize 1015 ! Find common box in the unfolded unit cell of aPartBox 1016 aComBox(:,:,nParts) = comBox(:,:) + & 1017 aPartBox(:,:,aPart) - aPartBoxFUC(:,:,aPart) 1018 ! Find common box in aBox origin. Index=1 in srcBox in both lines!!! 1019 aComBox(1,:,nParts) = aComBox(1,:,nParts) - aBox(1,:) 1020 aComBox(2,:,nParts) = aComBox(2,:,nParts) - aBox(1,:) 1021 ! Find common box in bBox origin 1022 bComBox(:,:,nParts) = comBox(:,:) + & 1023 bPartBox(:,:,bPart) - bPartBoxFUC(:,:,bPart) 1024 bComBox(1,:,nParts) = bComBox(1,:,nParts) - bBox(1,:) 1025 bComBox(2,:,nParts) = bComBox(2,:,nParts) - bBox(1,:) 1026#ifdef DEBUG_XC 1027! write(udebug,'(a,3(2i4,2x))') myName//' aPartBox=', & 1028! aPartBox(:,:,aPart) 1029! write(udebug,'(a,3(2i4,2x))') myName//' comBox=', & 1030! comBox(:,:) - aPartBox(:,:,aPart) + aPartBoxFUC(:,:,aPart) 1031! write(udebug,'(a,3(2i4,2x))') myName//' aComBox=', & 1032! aComBox(:,:,nParts) 1033! write(udebug,'(a,3(2i4,2x))') myName//' bComBox=', & 1034! bComBox(:,:,nParts) 1035#endif /* DEBUG_XC */ 1036 end if 1037 end do ! aPart 1038 end do ! bPart 1039 1040end subroutine commonBox 1041 1042!------------------------------------------------------------------------------- 1043 1044subroutine copyMeshData( nMesh, srcDistr, srcData, dstBox, dstData, task ) 1045 1046! Copies a box of values defined in a 3D mesh of a periodic unit cell. 1047 1048 implicit none 1049 integer, intent(in) :: nMesh(3) ! Global mesh size in each axis 1050 integer, intent(in) :: srcDistr ! Distrib. ID of source data 1051 ! or zero for serial execution 1052 real(gp),intent(in) :: srcData(0:,0:,0:,:) ! Source data values to be copied 1053 ! Its upper array bounds must be 1054 ! enough to contain the node box in 1055 ! the srcDistr parallel distrib. 1056 ! or nMesh(iAxis)-1 in serial exe 1057 integer, intent(in) :: dstBox(2,3) ! Destination mesh box: 1058 ! box(1,iAxis)=lower box limits 1059 ! box(2,iAxis)=upper box limits 1060 ! Box may be larger than the 1061 ! (0:nMesh-1) unit-cell range 1062 real(gp),intent(out):: dstData(0:,0:,0:,:) ! Destination data values. 1063 ! The upper array bounds must be 1064 ! dstBox(2,iAxis)-dst(1,iAxis) 1065 ! srcData and dstData must be 1066 ! different arrays 1067 integer,intent(inout),optional:: task ! ID of communication task 1068 1069 integer :: srcBox(2,3) 1070 1071! Associate task to distribution 1072 if (present(task)) call associateMeshTask( task, srcDistr ) 1073 1074! Find box limits of source data in this node 1075 call myMeshBox( nMesh, srcDistr, srcBox ) 1076 1077! Check that srcData and dstData are not identical already 1078! Avoid this in parallel, since different nodes might do different things 1079! if (all(srcBox==dstBox) .and. all(shape(srcData)==shape(dstData))) then 1080! if (all(srcData==dstData)) return 1081! end if 1082 1083! Copy data 1084 dstData = 0 1085 call reduceData( nMesh, srcBox, srcData, dstBox, dstData, taskID=task ) 1086 1087end subroutine copyMeshData 1088 1089!------------------------------------------------------------------------------- 1090 1091subroutine divideBox1D( box, nParts, partBox, blockSize, workload ) 1092 1093! Divides a 1D box according to workload (or uniformly if workload is absent) 1094 1095 implicit none 1096 integer, intent(in) :: box(2) 1097 integer, intent(in) :: nParts 1098! 1099! Avoid the creation of a temporary array by passing descriptor info 1100!!! integer, intent(out):: partBox(2,nParts) 1101 integer, intent(out):: partBox(:,:) 1102! 1103 integer, optional,intent(in) :: blockSize 1104 real(gp),optional,intent(in) :: workload(0:) 1105 1106 character(len=*),parameter:: myName = moduleName//'divideBox1D ' 1107 character(len=*),parameter:: errHead = myName//'ERROR: ' 1108 logical :: nextPart 1109 integer :: blckSize, boxSize, i, iPart, largerSize, last, & 1110 nBlocks, nLargerParts, partSize 1111 real(dp):: partWkld, wlSum 1112 1113! Trap a trivial case 1114 if (nParts==1) then 1115 partBox(:,1) = box 1116 return 1117 end if 1118 1119! Find box size 1120 boxSize = box(2) - box(1) + 1 1121 1122! Set block size 1123 if (present(blockSize)) then 1124 if (mod(boxSize,blockSize)/=0) & 1125 call die(errHead//'boxSize not a multiple of blockSize') 1126 blckSize = blockSize 1127 else 1128 blckSize = 1 1129 end if 1130 1131! Choose division mode 1132 if (present(workload)) then ! Divide box according to workload 1133 1134 ! Check size of workload array 1135 if (size(workload)/=boxSize) call die(errHead//'size(workload)/=boxSize') 1136 1137 ! Find intended workload of each part 1138 wlSum = sum(workload) 1139 partWkld = wlSum / nParts 1140 1141 ! Loop on parts 1142 partBox(1,1) = box(1) 1143 do iPart = 1,nParts-1 1144 last = 0 ! Last point with nonzero workload (or zero if none yet) 1145 do i = 0,boxSize-1 1146 if ( sum(workload(0:i)) > iPart*partWkld ) then ! Limit surpassed 1147 if (i==0 .or. last < i-1) then ! Place limit at center of vacuum 1148 partBox(2,iPart) = box(1) + (last+i)/2 1149 else if ( i==boxSize-1 .or. & 1150 abs(sum(workload(0:i-1))-iPart*partWkld) & 1151 < abs(sum(workload(0:i)) -iPart*partWkld) ) then 1152 partBox(2,iPart) = box(1) + i-1 1153 else 1154 partBox(2,iPart) = box(1) + i 1155 end if 1156 partBox(1,iPart+1) = partBox(2,iPart) + 1 1157 ! Avoid dividing blocks: parts must begin in a multiple of blockSize 1158 partBox(1,iPart+1) = partBox(1,iPart+1) & 1159 - mod( partBox(1,iPart+1), blckSize ) 1160 ! Prevent that iPart and iPart+1 overlap 1161 partBox(1,iPart+1) = max( partBox(1,iPart+1), & 1162 partBox(1,iPart)+blckSize ) 1163 partBox(2,iPart) = partBox(1,iPart+1) - 1 1164 ! Go to next part 1165 nextPart = .true. 1166 else ! Limit not surpassed 1167 nextPart = .false. 1168 end if ! ( sum(workload(axis,0:i)) > iPart*partWkld ) 1169 if (workload(i)>0._gp) last = i 1170 if (nextPart) exit ! i loop 1171 end do ! i 1172 end do ! iPart 1173 partBox(2,nParts) = box(2) 1174 1175 else ! (.not.present(workload)) => Divide box uniformly 1176 1177 nBlocks = boxSize / blckSize ! Number of blocks of points 1178 partSize = nBlocks / nParts ! Blocks per part 1179 nLargerParts = nBlocks - partSize*nParts ! Number of larger parts 1180 partSize = partSize * blckSize ! Points per part 1181 largerSize = partSize + blckSize ! One more block for larger parts 1182 partBox(1,1) = box(1) 1183 do iPart = 1,nLargerParts ! largerParts of size partSize+blockSize 1184 if (iPart>1) partBox(1,iPart) = partBox(2,iPart-1) + 1 1185 partBox(2,iPart) = partBox(1,iPart) + largerSize - 1 1186 end do 1187 do iPart = nLargerParts+1,nParts ! remaining parts of size partSize 1188 if (iPart>1) partBox(1,iPart) = partBox(2,iPart-1) + 1 1189 partBox(2,iPart) = partBox(1,iPart) + partSize - 1 1190 end do 1191 1192 end if ! (present(workload)) 1193 1194end subroutine divideBox1D 1195 1196!------------------------------------------------------------------------------- 1197 1198subroutine divideBox3D( nMesh, wlDistr, workload, box, & 1199 nParts, blockSize, axis, partBox ) 1200 1201! Divides a mesh box according to a given workload and axis. If axis<1 on 1202! input, it is chosen to maximize the spatial dispersion of workload 1203 1204 implicit none 1205 integer, intent(in) :: nMesh(3) ! Global mesh size in each axis 1206 integer, intent(in) :: wlDistr ! Mesh distribution ID of workload array 1207 real(gp),intent(in) :: workload(0:,0:,0:) ! Work load of each mesh point 1208 integer, intent(in) :: box(2,3) ! Mesh box to be divided 1209 integer, intent(in) :: nParts ! Number of parts of the divided box 1210 integer, intent(in) :: blockSize ! Size of blocks of mesh points in each 1211 ! axis that cannot be split among nodes 1212 integer, intent(inout):: axis ! Axis along which division is made 1213 integer, intent(out) :: partBox(2,3,nParts) ! Divided part boxes 1214 1215 character(len=*),parameter:: myName = moduleName//'divideBox3D ' 1216 character(len=*),parameter:: errHead = myName//'ERROR: ' 1217 integer :: boxShape(3), i, iPart, ix, maxShape 1218 real(dp):: maxDisp, wlDisp, wlSum, wlSum1, wlSum2 1219! logical :: nextPart 1220! integer :: largerSize, last, nBlocks, nLargerParts, partSize 1221! real(dp):: partWkld 1222 real(gp),allocatable:: prjWkld(:,:) 1223 1224! Initialize part boxes as total box 1225 forall (iPart=1:nParts) partBox(:,:,iPart) = box(:,:) 1226 1227! Trap a trivial case 1228 if (nParts<2) return 1229 1230! Check that nMesh and box are consistent with blockSize 1231 if (any(mod(nMesh,blockSize)/=0)) & 1232 call die(errHead//'nMesh and blockSize are not consistent') 1233 if (any(mod(box(1,:),blockSize)/=0) .or. & 1234 any(mod(box(2,:)+1,blockSize)/=0)) & 1235 call die(errHead//'box and blockSize are not consistent') 1236 1237! Check that workload is nonnegative 1238 if (any(workload<0._gp)) call die(errHead//'negative workload') 1239 1240! Allocate array for the projection of workload along each axis 1241 boxShape(:) = box(2,:) - box(1,:) + 1 1242 allocate( prjWkld(3,0:maxval(boxShape)-1) ) 1243 1244! Find projection of workload along each axis 1245 prjWkld = 0 1246 call projectMeshData( nMesh, wlDistr, workload, box, prjWkld ) 1247 1248#ifdef DEBUG_XC 1249! write(udebug,*) myName//'projected workload:' 1250! write(udebug,'(i4,3f15.6)') (i,prjWkld(:,i),i=0,maxval(boxShape)-1) 1251#endif /* DEBUG_XC */ 1252 1253! Find total workload in box 1254 wlSum = sum(prjWkld) / 3 ! Since the sum of each projection must be equal 1255 1256! Select axis for division 1257 if (axis<1 .or. axis>3) then ! Otherwise, accept input axis 1258 if (wlSum==0._dp) then ! Choose longest axis 1259 maxShape = maxval(boxShape) 1260 do ix = 3,1,-1 ! If equivalent, take largest ix (z better than x) 1261 if (boxShape(ix) == maxShape) then 1262 axis = ix 1263 exit ! ix loop 1264 end if 1265 end do 1266 else ! Choose axis with maximum spatial dispersion of workload 1267 maxDisp = 0 1268 do ix = 3,1,-1 1269 wlSum1 = 0 1270 wlSum2 = 0 1271 do i = 0,boxShape(ix)-1 1272 wlSum1 = wlSum1 + prjWkld(ix,i) * i 1273 wlSum2 = wlSum2 + prjWkld(ix,i) * i**2 1274 end do 1275 wlDisp = wlSum2/wlSum - (wlSum1/wlSum)**2 1276 if (wlDisp>maxDisp) then 1277 axis = ix 1278 maxDisp = wlDisp 1279 end if 1280 end do ! ix 1281 end if ! (wlSum==0._gp) 1282 end if ! (axis<1 .or. axis>3) 1283 1284! Divide box along axis 1285 if (wlSum==0._dp) then ! Divide box uniformly 1286 call divideBox1D( box(:,axis), nParts, partBox(:,axis,:), blockSize ) 1287 else ! Divide projected workload uniformly 1288 call divideBox1D( box(:,axis), nParts, partBox(:,axis,:), blockSize, & 1289 prjWkld(axis,0:boxShape(axis)-1) ) 1290 end if ! (wlSum==0._dp) 1291 1292#ifdef DEBUG_XC 1293! Check that all box limits are consistent 1294 if ( any(box(1,axis) > partBox(1,axis,:)) .or. & 1295 any(partBox(2,axis,1:nParts-1) >= partBox(1,axis,2:nParts)) .or. & 1296 any(partBox(1,axis,:) > partBox(2,axis,:)) .or. & 1297 any(partBox(2,axis,:) > box(2,axis)) ) then 1298 write(udebug,'(a,3(2i6,2x))') errHead//'box=', box 1299 write(udebug,'(a,/,(3(2i6,2x)))') errHead//'partBox=', partBox 1300 call die(errHead//'inconsistent partBox limits') 1301 end if 1302#endif /* DEBUG_XC */ 1303 1304 deallocate( prjWkld ) 1305 1306end subroutine divideBox3D 1307 1308!------------------------------------------------------------------------------- 1309 1310subroutine fftMeshDistr( nMesh, fftDistr, axisDistr ) 1311 1312! Creates and returns a homogeneaous 3D mesh distribution fftDistr, and three 1313! 2D distributions axisDistr, whose boxes extend wholly over one of the three 1314! cell axes. These axisDistr allow fully local 1D FFTs in one of the axes. 1315! The fftDistr and axisDistr are designed to make already optimal (without any 1316! collisions) the default sequence of inter-node communications, returned by 1317! all2allTransferOrder. This is beacause all simultaneous communications occur 1318! between nodes separated by the same span (difference between node indexes). 1319! Thus, no task arguments are required to optimize such communications. 1320 1321 implicit none 1322 integer, intent(in) :: nMesh(3) ! Mesh divisions of each axis 1323 integer, intent(inout):: fftDistr ! 3D mesh distr. used in 3D FFT 1324 integer,optional,intent(inout):: axisDistr(3) ! 2D distr. used for the 1D FFTs 1325 1326 character(len=*),parameter:: myName = 'fftMeshDistr ' 1327 character(len=*),parameter:: errHead = myName//'ERROR: ' 1328 integer:: axis, axis1, axis2, axis3, axisNodes(3), box0(2,3), boxNodes(3), & 1329 iDistr, iNode, iNode1, iNode2, iNode3, jNode2, jNode3, & 1330 node0, nodeSpan(3), oldDistr, rowMesh(3), rowNodes 1331 integer,allocatable:: subBox(:,:,:) 1332 logical:: found 1333 type(distrType),pointer:: distr 1334 1335! Trap the serial case (totNodes available from module parallel) 1336 if (totNodes==1) then 1337 fftDistr = 0 1338 if (present(axisDistr)) axisDistr(:) = 0 1339 return 1340 end if 1341 1342! Check if the input FFT distribution IDs are already defined as such 1343 iDistr = indexDistr( fftDistr ) 1344 if (iDistr>0 .and. iDistr<=maxDistr) then 1345 distr => storedMeshDistr(iDistr) 1346 if (distr%defined .and. all(distr%nMesh==nMesh)) then 1347 found = .true. ! But now check the axis distributions 1348 if (present(axisDistr)) then 1349 do axis = 1,3 1350 iDistr = indexDistr( axisDistr(axis) ) 1351 if (iDistr>0 .and. iDistr<=maxDistr) then 1352 distr => storedMeshDistr(iDistr) 1353 if (.not.distr%defined .or. any(distr%nMesh/=nMesh)) & 1354 found = .false. 1355 else 1356 found = .false. 1357 end if 1358 end do ! axis 1359 end if ! (present(axisDistr)) 1360 if (found) return ! Since the input values are still valid 1361 end if ! (distr%defined .and. ...) 1362 end if ! (iDistr>0) 1363 1364! Find optimal distribution of nodes on axes 1365 call optimizeNodeDistr( nMesh, totNodes, axisNodes ) 1366 1367! Create homogeneous 3D distribution 1368#ifdef DEBUG_XC 1369! write(udebug,*) myName//'calling setMeshDistr with fftDistr' 1370#endif /* DEBUG_XC */ 1371 call setMeshDistr( fftDistr, nMesh, nNodesX=axisNodes(1), & 1372 nNodesY=axisNodes(2), nNodesZ=axisNodes(3) ) 1373 1374! Return already if axis distributions are not required 1375 if (.not.present(axisDistr)) return 1376 1377! Find span between nodes along each axis 1378! nodeSpan(1) = 1 1379! nodeSpan(2) = axisNodes(1) 1380! nodeSpan(3) = axisNodes(1) * axisNodes(2) 1381! Third axis is innermost for node distribution (see setMeshDistr) 1382 nodeSpan(3) = 1 1383 nodeSpan(2) = axisNodes(3) 1384 nodeSpan(1) = axisNodes(3) * axisNodes(2) 1385 1386#ifdef DEBUG_XC 1387! write(udebug,'(a,3i4)') myName//'axisNodes=', axisNodes 1388! write(udebug,'(a,3i4)') myName//'nodeSpan=', nodeSpan 1389#endif /* DEBUG_XC */ 1390 1391! Allocate a small temporary array 1392 allocate( subBox(2,0:maxval(axisNodes)-1,3) ) 1393 1394! Loop on the three cell axes 1395 do axis1 = 1,3 1396 1397 ! (Re)initialize the axis distribution for the 1D FFT transforms 1398 oldDistr = axisDistr(axis1) 1399 call freeMeshDistr( axisDistr(axis1) ) 1400 call initDistr( axisDistr(axis1), nMesh, 0, totNodes ) 1401 1402 ! Find the two other axes 1403 axis2 = modulo(axis1,3) + 1 1404 axis3 = modulo(axis2,3) + 1 1405 1406 ! Point towards the apropriate axis distribution 1407 iDistr = indexDistr( axisDistr(axis1) ) 1408 distr => storedMeshDistr(iDistr) 1409 1410 ! Loop on the node boxes over the two other axes 1411 do iNode3 = 0,axisNodes(axis3)-1 1412 do iNode2 = 0,axisNodes(axis2)-1 1413 1414 ! Find the first node in the row of nodes along axis1 1415 node0 = nodeSpan(axis2)*iNode2 + nodeSpan(axis3)*iNode3 1416 1417 ! Find the box limits along axis2 and axis3 1418 call nodeMeshBox( nMesh, fftDistr, node0, box0 ) 1419 1420 ! Find the total box of a row of nodes along axis1 1421 rowNodes = axisNodes(axis1) 1422 rowMesh(1) = nMesh(axis1) 1423 rowMesh(2) = box0(2,axis2) - box0(1,axis2) + 1 1424 rowMesh(3) = box0(2,axis3) - box0(1,axis3) + 1 1425 1426 ! Find an optimal (re)distribution of rowNodes along axis2 and/or axis3 1427 boxNodes(1) = 1 1428 call optimizeNodeDistr( rowMesh(2:3), rowNodes, boxNodes(2:3) ) 1429 1430 ! Divide the row box along axis2 and/or axis3 1431 subBox(1,0,1) = 0 1432 subBox(2,0,1) = nMesh(axis1) - 1 1433 call divideBox1D( box0(:,axis2), boxNodes(2), subBox(:,0:boxNodes(2)-1,2)) 1434 call divideBox1D( box0(:,axis3), boxNodes(3), subBox(:,0:boxNodes(3)-1,3)) 1435 1436 ! Set the new boxes 1437 iNode = node0 1438 do jNode3 = 0,boxNodes(3)-1 1439 do jNode2 = 0,boxNodes(2)-1 1440 distr%box(1,axis1,iNode) = 0 1441 distr%box(2,axis1,iNode) = nMesh(axis1)-1 1442 distr%box(:,axis2,iNode) = subBox(:,jNode2,2) 1443 distr%box(:,axis3,iNode) = subBox(:,jNode3,3) 1444 iNode = iNode + nodeSpan(axis1) 1445 end do ! jNode2 1446 end do ! jNode3 1447 1448#ifdef DEBUG_XC 1449! write(udebug,'(a,2i4,3(2x,2i4))') & 1450! myName//'axis1,node0,box0=', axis1, node0, box0 1451! write(udebug,'(a,3i4)') myName//'boxNodes=', boxNodes 1452! do jNode2 = 0,boxNodes(2)-1 1453! write(udebug,'(a,i4,2x,3i4)') & 1454! myName//'axis2,subBox=', axis2, subBox(:,jNode2,2) 1455! end do 1456! do jNode3 = 0,boxNodes(3)-1 1457! write(udebug,'(a,i4,2x,3i4)') & 1458! myName//'axis3,subBox=', axis3, subBox(:,jNode3,3) 1459! end do 1460#endif /* DEBUG_XC */ 1461 1462 end do ! iNode2 1463 end do ! iNode3 1464 1465 ! Check if this distribution was already defined 1466 call reduceDistr( axisDistr(axis1) ) 1467 1468#ifdef DEBUG_XC 1469 iDistr = indexDistr( axisDistr(axis1) ) 1470 distr => storedMeshDistr(iDistr) 1471! write(udebug,'(a,3i4)') myName//'axis1,axis2,axis3=', axis1, axis2, axis3 1472 write(udebug,'(a,6x,3i4,3(2x,2i4))') & 1473 myName//' old/newDistrID,iDistr,myBox=', & 1474 oldDistr, axisDistr(axis1), iDistr, distr%box(:,:,myNode) 1475#endif /* DEBUG_XC */ 1476 1477 end do ! axis1 1478 1479! Deallocate temporary array 1480 deallocate( subBox ) 1481 1482end subroutine fftMeshDistr 1483 1484!------------------------------------------------------------------------------- 1485 1486subroutine freeMeshDistr( distrID ) 1487 1488! Erases a mesh distribution ID and frees the distribution slot if 1489! no other IDs are assigned to it 1490 1491 implicit none 1492 integer,intent(in):: distrID ! ID of an allocated distribution 1493 1494 character(len=*),parameter:: myName = 'freeMeshDistr ' 1495 character(len=*),parameter:: errHead = myName//'ERROR: ' 1496 integer:: iDistr, iID, iNode, it, iTask, taskID 1497 logical:: found 1498 type(distrType),pointer :: distr 1499 type(taskType),pointer :: task 1500 1501! Find internal distribution index 1502 iDistr = indexDistr( distrID ) 1503 1504! Check that distribution exists 1505 if (iDistr<1 .or. iDistr>maxDistr) return 1506 distr => storedMeshDistr(iDistr) ! Just a shorter name 1507 1508! Erase ID from the distribution 1509 do iID = 1,maxDistrID 1510 if (distr%ID(iID)==distrID) then 1511 distr%ID(iID) = -1 1512 exit ! iID loop 1513 end if 1514 end do ! iID 1515 1516! Free distribution if no other IDs are assigned to it 1517 if (all(distr%ID<0)) then 1518 ! First free all distribution tasks 1519#ifdef DEBUG_XC 1520! if (any(distr%task>0)) & 1521! write(udebug,'(a,i3,2x,25i3,/,(32x,25i3))') & 1522! myName//'iDistr,tasks=', iDistr, pack(distr%task,distr%task>0) 1523#endif /* DEBUG_XC */ 1524 do it = 1,maxDistrTasks 1525 iTask = distr%task(it) 1526 if (iTask>0) then 1527 task => storedMeshTask(iTask) 1528 found = .false. 1529 do iID = 1,maxTaskID ! Find a valid ID of task, to call freeMeshTask 1530 taskID = task%ID(iID) 1531 if (taskID>0) then 1532 found = .true. 1533 call freeMeshTask( taskID ) 1534 exit ! iID loop 1535 end if ! (taskID>0) 1536 end do ! iID 1537 if (.not.found) call die(myName//'ERROR: no valid task ID found') 1538 end if ! (iTask>0) 1539 end do ! it 1540 ! Finally free distribution itself 1541 deallocate( distr%box ) 1542 distr%defined = .false. 1543 end if 1544 1545end subroutine freeMeshDistr 1546 1547!------------------------------------------------------------------------------- 1548 1549subroutine freeMeshTask( taskID ) 1550 1551! Erases a mesh task ID and frees the task slot if 1552! no other IDs are assigned to it 1553 1554 implicit none 1555 integer,intent(in):: taskID ! ID of an allocated task 1556 1557 character(len=*),parameter:: myName = 'freeMeshTask ' 1558 character(len=*),parameter:: errHead = myName//'ERROR: ' 1559 integer:: distrID, i, iDistr, it, iTask, iID 1560 logical:: found 1561 type(distrType),pointer :: distr 1562 type(taskType),pointer :: task 1563 1564! Find internal task index 1565 iTask = indexTask( taskID ) 1566 1567! Check that task exists 1568 if (iTask<1 .or. iTask>maxTasks) return 1569 task => storedMeshTask(iTask) ! Just a shorter name 1570 1571#ifdef DEBUG_XC 1572! write(udebug,'(a,2i4,2x,2i4)') & 1573! myName//'taskID,iTask,task%distr=', taskID, iTask, task%distr 1574#endif /* DEBUG_XC */ 1575 1576! Erase ID from the task ID list 1577 do iID = 1,maxTaskID 1578 if (task%ID(iID)==taskID) then 1579 task%ID(iID) = -1 1580 exit ! iID loop 1581 end if 1582 end do ! iID 1583 1584! Free task if no other IDs are assigned to it 1585 if (all(task%ID<0)) then 1586 ! First eliminate task from all its associated distributions 1587 do i = 1,2 1588 iDistr = task%distr(i) 1589 if (iDistr>0) then 1590 distr => storedMeshDistr(iDistr) 1591 found = .false. 1592 do it = 1,maxDistrTasks 1593 if (distr%task(it)==iTask) then 1594 distr%task(it) = -1 1595 found = .true. 1596 end if 1597 end do ! it 1598 if (.not.found) call die(errHead//'task-distr association not found') 1599 end if ! (iDistr>0) 1600 end do ! i 1601 ! Finally free task itself 1602 deallocate( task%trsfDir, task%trsfNode, task%dstBox, task%srcBox ) 1603 task%associated = .false. 1604 task%optimized = .false. 1605 task%defined = .false. 1606 task%distr = -1 1607 task%ID = -1 1608 end if 1609 1610end subroutine freeMeshTask 1611 1612!------------------------------------------------------------------------------- 1613 1614subroutine gatherBoxes( box, boxes ) 1615 1616! Gathers the mesh boxes of all processors 1617 1618 implicit none 1619 integer,intent(in) :: box(2,3) ! My node's mesh box 1620 integer,intent(out):: boxes(2,3,0:totNodes-1) ! All node's boxes 1621 1622#ifdef HAVE_MPI 1623! Gather the boxes of all nodes 1624 call MPI_AllGather( box(1,1) , 6, MPI_Integer, & 1625 boxes(1,1,0), 6, MPI_Integer, COMM, MPIerror ) 1626#else 1627! Copy the only node's box 1628 boxes(:,:,0) = box(:,:) 1629#endif 1630 1631end subroutine gatherBoxes 1632 1633!------------------------------------------------------------------------------- 1634 1635integer function indexDistr( distrID ) 1636 1637! Returns the internal memory position in which a distribution is stored or 1638! 0 if distrID==0 1639! -1 if there is no allocated distribution with that ID 1640 1641 implicit none 1642 integer,intent(in):: distrID ! Distribution ID 1643 1644 integer:: iDistr 1645 1646 indexDistr = -1 1647 1648 if (distrID==0) then 1649 indexDistr = 0 1650 else if (distrID>0) then 1651 do iDistr = 1,maxDistr 1652 if ( storedMeshDistr(iDistr)%defined .and. & 1653 any(storedMeshDistr(iDistr)%ID==distrID) ) then 1654 indexDistr = iDistr 1655 return 1656 end if 1657 end do ! iDistr 1658 end if 1659 1660end function indexDistr 1661 1662!------------------------------------------------------------------------------- 1663 1664integer function indexTask( taskID ) 1665 1666! Returns the internal memory position in which a task is stored or 1667! -1 if there is no allocated task with that ID 1668 1669 implicit none 1670 integer,intent(in):: taskID ! Task ID 1671 1672 integer:: iTask 1673 1674 indexTask = -1 1675 1676 if (taskID>0) then 1677 do iTask = 1,maxTasks 1678 if ( any(storedMeshTask(iTask)%ID==taskID) ) then 1679 indexTask = iTask 1680 return 1681 end if 1682 end do ! iTask 1683 end if 1684 1685end function indexTask 1686 1687!------------------------------------------------------------------------------- 1688 1689subroutine initDistr( distrID, nMesh, firstNode, nNodes ) 1690 1691! Initializes a parallel distribution of mesh points among processor nodes. 1692! In serial execution (totNodes==1) it simply returns distrID=0. 1693 1694 implicit none 1695 integer,intent(out):: distrID ! ID assigned to the mesh distrib. 1696 integer,intent(in) :: nMesh(3) ! Mesh divisions in each axis 1697 ! (including "subpoints") 1698 integer,intent(in) :: firstNode ! First node in the mesh distr. 1699 integer,intent(in) :: nNodes ! Total nodes in the mesh distr 1700 ! If not present all nodes are used 1701 1702 character(len=*),parameter:: myName = moduleName//'initDistr ' 1703 character(len=*),parameter:: errHead = myName//'ERROR: ' 1704 type(distrType),pointer:: distr 1705 integer:: iDistr, iID 1706 logical:: found 1707 1708! Trap the serial case (totNodes available from module parallel) 1709 if (totNodes==1) then 1710 distrID = 0 1711 return 1712 end if 1713 1714! Find an available internal distribution slot 1715 found = .false. 1716 do iDistr = 1,maxDistr 1717 if (all(storedMeshDistr(iDistr)%ID<0)) then ! Available (empty) slot 1718 found = .true. 1719 exit ! iDistr loop 1720 end if 1721 end do ! iDistr 1722 if (.not.found) call die(errHead//'parameter maxDistr too small') 1723 1724! Increase private distribution ID counter and assign it to new distribution 1725 nDistrID = nDistrID + 1 1726 distrID = nDistrID 1727 1728! Set distribution parameters, except box 1729 distr => storedMeshDistr(iDistr) ! Just a shorter name 1730 distr%defined = .true. 1731 distr%firstNode = firstNode 1732 distr%nNodes = nNodes 1733 distr%nMesh = nMesh 1734 distr%task = -1 1735 distr%ID(:) = -1 1736 distr%ID(1) = distrID 1737 1738! Allocate box array 1739 allocate( distr%box(2,3,0:totNodes-1) ) 1740 1741! Initialize node boxes so that lower bound > upper bound (null box) 1742 distr%box(1,:,:) = 0 1743 distr%box(2,:,:) = -1 1744 1745#ifdef DEBUG_XC 1746! write(udebug,'(a,2i6)') myName//'distrID,iDistr=', distrID, iDistr 1747#endif /* DEBUG_XC */ 1748 1749end subroutine initDistr 1750 1751!------------------------------------------------------------------------------- 1752 1753subroutine initTask( taskID ) 1754 1755! Initializes the ID of a communication task of mesh data 1756 1757 implicit none 1758 integer,intent(out):: taskID 1759 1760 character(len=*),parameter:: myName = moduleName//'initTask ' 1761 character(len=*),parameter:: errHead = myName//'ERROR: ' 1762 logical:: found 1763 integer:: iTask 1764 type(taskType),pointer:: task 1765 1766! Trap the serial case (totNodes available from module parallel) 1767 if (totNodes==1) then 1768 taskID = 0 1769 return 1770 end if 1771 1772! Find an available internal distribution slot 1773 found = .false. 1774 do iTask = 1,maxTasks 1775 if (all(storedMeshTask(iTask)%ID<0)) then ! Available (empty) slot 1776 found = .true. 1777 exit ! iTask loop 1778 end if 1779 end do ! iTask 1780 if (.not.found) call die(errHead//'parameter maxTasks too small') 1781 1782! Increase private distribution ID counter and assign it to new distribution 1783 nTaskID = nTaskID + 1 1784 taskID = nTaskID 1785 1786! Set task parameters, except boxes 1787 task => storedMeshTask(iTask) ! Just a shorter name 1788 task%defined = .false. 1789 task%optimized = .false. 1790 task%associated = .false. 1791 task%distr = -1 1792 task%nTrsf = 0 1793 task%ID = -1 1794 task%ID(1) = taskID 1795 1796! Allocate transfer sequence (a send and a receive to/from each other node) 1797 allocate( task%trsfNode(0:2*(totNodes-1)) ) 1798 allocate( task%trsfDir(0:2*(totNodes-1)) ) 1799 1800! Allocate box arrays 1801 allocate( task%srcBox(2,3,0:totNodes-1) ) 1802 allocate( task%dstBox(2,3,0:totNodes-1) ) 1803 1804! Initialize boxes so that lower bound > upper bound (null box) 1805 task%srcBox(1,:,:) = 0 1806 task%srcBox(2,:,:) = -1 1807 task%dstBox(1,:,:) = 0 1808 task%dstBox(2,:,:) = -1 1809 1810#ifdef DEBUG_XC 1811! write(udebug,'(a,2i6)') myName//'taskID, iTask=', taskID, iTask 1812#endif /* DEBUG_XC */ 1813 1814end subroutine initTask 1815 1816!------------------------------------------------------------------------------- 1817 1818subroutine myMeshBox( nMesh, distrID, box ) 1819 1820! Finds the mesh box belonging to a node in a parallel mesh distribution, 1821! If iDistr=0, it returns box(1,:)=0 and box(2,:)=nMesh(:)-1 1822! If node stores no mesh points, its (empty) box returns with box(1,:)>box(2,:) 1823! Note: although nMesh is available from the distr. data, it is also included 1824! in the routine arguments, to handle directly the case iDistr=0. 1825 1826 implicit none 1827 integer, intent(in) :: nMesh(3) ! Mesh divisions in each axis 1828 integer, intent(in) :: distrID ! Mesh-distribution ID 1829 integer, intent(out):: box(2,3) ! Mesh box: box(1,:) = lower bounds 1830 ! box(2,:) = upper bounds 1831 1832 call nodeMeshBox( nMesh, distrID, myNode, box ) 1833 1834end subroutine myMeshBox 1835 1836!------------------------------------------------------------------------------- 1837 1838subroutine nodeMeshBox( nMesh, distrID, node, box ) 1839 1840! Finds the mesh box belonging to a node in a parallel mesh distribution, 1841! If iDistr=0, it returns box(1,:)=0 and box(2,:)=nMesh(:)-1 1842! If node stores no mesh points, its (empty) box returns with box(1,:)>box(2,:) 1843! Note: although nMesh is available from the distr. data, it is also included 1844! in the routine arguments, to handle directly the case iDistr=0. 1845 1846 implicit none 1847 integer,intent(in) :: nMesh(3) ! Mesh divisions in each axis 1848 integer,intent(in) :: distrID ! Mesh-distribution ID 1849 integer,intent(in) :: node ! Processor node whose box is needed 1850 ! If not present, node=myNode 1851 integer,intent(out):: box(2,3) ! Mesh box: box(1,:) = lower bounds 1852 ! box(2,:) = upper bounds 1853 1854 character(len=*),parameter:: myName = 'nodeMeshBox ' 1855 character(len=*),parameter:: errHead = myName//'ERROR: ' 1856 1857 integer:: iDistr 1858 1859 if (distrID==0) then ! Data not distributed 1860 box(1,:) = 0 1861 box(2,:) = nMesh(:)-1 1862 else 1863 iDistr = indexDistr( distrID ) 1864 if (iDistr>0 .and. iDistr<=maxDistr) then 1865 if (any(nMesh/=storedMeshDistr(iDistr)%nMesh)) & 1866 call die(errHead//'nMesh/=distr%nMesh') 1867 box(:,:) = storedMeshDistr(iDistr)%box(:,:,node) 1868 else 1869 call die(errHead//'undefined mesh distribution') 1870 end if 1871 end if 1872 1873end subroutine nodeMeshBox 1874 1875!------------------------------------------------------------------------------- 1876 1877subroutine optimizeNodeDistr( nMesh, nNodes, axisNodes ) 1878 1879! Finds a reasonably optimal distribution of nodes over the different axes. 1880! Optimal here means that it leads to most cubic mesh boxes for each node. 1881 1882 implicit none 1883 integer,intent(in) :: nMesh(:) ! Mesh points in each axis 1884 integer,intent(in) :: nNodes ! Total number of processor nodes 1885 integer,intent(out):: axisNodes(:) ! Number of nodes in each axis 1886 1887 character(len=*),parameter:: myName = moduleName//'optimizeNodeDistr ' 1888 character(len=*),parameter:: errHead = myName//'ERROR: ' 1889 integer,parameter:: maxFac = 1000 1890 integer :: fac(maxFac), i2, i3, i5, iFac1, iFac2, iRem, & 1891 n, n2, n3, n5, nAxes, nFac, nod1, nod2, nod3, nRem, remFac 1892 real(dp):: box1, box2, box3, boxAvg, boxErr, minErr 1893 1894! Find number of axes and handle the trivial single-axis case 1895 nAxes = size(nMesh) 1896 if (nAxes<1) then 1897 return 1898 else if (nAxes==1) then 1899 axisNodes(1) = nNodes 1900 return 1901 else if (nAxes>3) then 1902 call die(errHead//'not prepared for nAxes>3') 1903 end if 1904 1905! Find prime factors of nNodes (only 2, 3, and 5 coded presently) so that 1906! nNodes = 2**n2 * 3**n3 * 5**n5 * remFac**nRem 1907 n = nNodes 1908 n2 = 0 1909 n3 = 0 1910 n5 = 0 1911 do 1912 if (mod(n,2)==0) then 1913 n2 = n2 + 1 1914 n = n / 2 1915 else if (mod(n,3)==0) then 1916 n3 = n3 + 1 1917 n = n / 3 1918 else if (mod(n,5)==0) then 1919 n5 = n5 + 1 1920 n = n / 5 1921 else 1922 exit ! do loop 1923 end if 1924 end do 1925 remFac = n 1926 if (remFac==1) then 1927 nRem = 0 1928 else 1929 nRem = 1 1930 end if 1931 1932! Find all possible factors of nNodes 1933 nFac = 0 1934 do i2 = 0,n2 1935 do i3 = 0,n3 1936 do i5 = 0,n5 1937 do iRem = 0,nRem 1938 nFac = nFac + 1 1939 if (nFac>maxFac) call die(errHead//'parameter maxFac too small') 1940 fac(nFac) = 2**i2 * 3**i3 * 5**i5 * remFac**iRem 1941 end do ! iRem 1942 end do ! i5 1943 end do ! i3 1944 end do ! i2 1945 1946! Try all possible combinations of factors on each axis and select that with 1947! most cubic boxes 1948 minErr = huge(minErr) 1949 do iFac1 = 1,nFac 1950 nod1 = fac(iFac1) 1951 box1 = nMesh(1) / real(nod1,dp) 1952 if (nAxes==2) then 1953 nod2 = nNodes / nod1 1954 box2 = nMesh(2) / real(nod2,dp) 1955 boxAvg = (box1 + box2) / 2 1956 boxErr = log(box1/boxAvg)**2 + log(box2/boxAvg)**2 1957 if (boxErr<minErr) then 1958 axisNodes(1) = nod1 1959 axisNodes(2) = nod2 1960 minErr = boxErr 1961 end if 1962 else ! (nAxes==3) 1963 do iFac2 = 1,nFac 1964 nod2 = fac(iFac2) 1965 if (mod(nNodes,nod1*nod2) /= 0) cycle ! iFac2 loop 1966 nod3 = nNodes / (nod1*nod2) 1967 box2 = nMesh(2) / real(nod2,dp) 1968 box3 = nMesh(3) / real(nod3,dp) 1969 boxAvg = (box1 + box2 + box3) / 3 1970 boxErr = log(box1/boxAvg)**2 + log(box2/boxAvg)**2 + log(box3/boxAvg)**2 1971 if (boxErr<minErr) then 1972 axisNodes(1) = nod1 1973 axisNodes(2) = nod2 1974 axisNodes(3) = nod3 1975 minErr = boxErr 1976 end if 1977 end do ! iFac2 1978 end if ! (nAxes==2) 1979 end do ! iFac1 1980 1981end subroutine optimizeNodeDistr 1982 1983!------------------------------------------------------------------------------- 1984 1985subroutine optimizeTransferOrder( nNodes, node, nTrsf, trsfNode, trsfDir ) 1986 1987! Searches an optimal order of transfer communications 1988 1989 1990 implicit none 1991 integer,intent(in) :: nNodes ! Number of parallel processor nodes. The 1992 ! number of transfers per node is 2*nNodes-1 1993 integer,intent(in) :: node ! Local processor node, in range (0:nNodes-1) 1994 integer,intent(in) :: nTrsf ! Number of required transfers 1995 integer,intent(inout):: trsfNode(nTrsf) ! To/from transfer node. 1996 ! First 'transfer' is local: trsfNode(0)=node 1997 integer,intent(inout):: trsfDir(nTrsf) ! Transfer direction: 1998 ! trsfDir=+1 => from node to trsfNode 1999 ! trsfDir=-1 => from trsfNode to node 2000 2001 character(len=*),parameter:: myName = moduleName//'optimizeTransferOrder ' 2002 character(len=*),parameter:: errHead = myName//'ERROR: ' 2003 real(dp), parameter:: incrFact = 2.0_dp ! Memory increase factor >1 2004 integer:: dir12, i1, i2, iter, iTrsf, maxTime, maxTrsf, MPIerror, & 2005 myNode, node1, node2, nTrsfNode1, time, trsf12 2006 integer:: orderNode1(nNodes), orderNode2(2*nNodes) 2007 logical:: found 2008 real(dp):: nTrsfNode(2*nNodes) 2009 integer, pointer:: inTrsf(:,:)=>null(), & 2010 myTrsf(:)=>null(), outTrsf(:,:)=>null() 2011 2012! In serial execution, do nothing 2013#ifdef HAVE_MPI 2014 2015! Find the maximun number of transfers of any node 2016 call MPI_AllReduce( nTrsf, maxTrsf, 1, MPI_Integer, & 2017 MPI_Max, Comm, MPIerror ) 2018 2019! Trap a trivial case, in which there is nothing to optimize 2020 if (maxTrsf < 2) return 2021 2022#ifdef DEBUG_XC 2023! write(udebug,'(a,2i6)') myName//'nTrsf,maxTrsf=', nTrsf, maxTrsf 2024! if (maxTrsf > 2*nNodes) call die(errHead//'too many transfers per node') 2025#endif /* DEBUG_XC */ 2026 2027! Allocate arrays for the input transfer sequences of all nodes 2028 call re_alloc( myTrsf, 1,maxTrsf, name=myName//'myTrsf' ) 2029 call re_alloc( inTrsf, 1,maxTrsf, 1,nNodes, name=myName//'inTrsf' ) 2030 2031! Copy transfer sequence of my node to a different format 2032! Absolute value stores transfer node. Sign stores transfer direction. 2033! Internally, we use range (1:nNodes) rather than (0:nNodes-1) 2034 myNode = node + 1 2035 myTrsf(1:nTrsf) = trsfDir(:)*(trsfNode(:)+1) 2036 2037! Gather initial transfer sequence of all nodes 2038 call MPI_AllGather( myTrsf, maxTrsf, MPI_Integer, & 2039 inTrsf(1,1), maxTrsf, MPI_Integer, COMM, MPIerror ) 2040 2041! Find the number of transfers of each node 2042 nTrsfNode(1:nNodes) = count( inTrsf(:,1:nNodes)/=0, dim=1 ) 2043 2044! Find the order of nodes by increasing number of transfers 2045 call ordix( nTrsfNode, 1, nNodes, orderNode1 ) 2046 2047! Allocate array for the optimized transfer times of all nodes 2048 maxTime = maxTrsf * incrFact 2049 call re_alloc( outTrsf, 1,maxTime, 1,nNodes, name=myName//'outTrsf' ) 2050 2051! Schedule the transfers of each node 2052 do i1 = nNodes,1,-1 ! Give priority to busiest nodes 2053 node1 = orderNode1(i1) 2054 2055 ! Find the number of transfers of node 1 2056 nTrsfNode1 = count( inTrsf(:,node1)/=0 ) 2057 2058 ! Find the number of transfers of each node2, that transfers to/from node1 2059 do i2 = 1,nTrsfNode1 2060 node2 = abs( inTrsf(i2,node1) ) 2061 nTrsfNode(i2) = count( inTrsf(:,node2)/=0 ) 2062 end do 2063 2064 ! Order node2 by increasing number of transfers 2065 call ordix( nTrsfNode, 1, nTrsfNode1, orderNode2 ) 2066 2067 ! Assign a time to each unassigned transfer of node1 2068 do iTrsf = nTrsfNode1,1,-1 ! Give priority to busiest node2 2069 i2 = orderNode2(iTrsf) 2070 2071 trsf12 = inTrsf(i2,node1) 2072 node2 = abs( trsf12 ) 2073 dir12 = sign( 1, trsf12 ) 2074 2075 ! Check that this transfer is not already assigned 2076 if (any(outTrsf(:,node1)==trsf12)) cycle ! iTrsf loop 2077 2078 ! Look for the first available transfer time and assign it 2079 found = .false. 2080 realloc_loop: do iter = 1,2 ! Loop for case that maxTime is too small 2081 do time = 1,maxTime 2082 if (outTrsf(time,node1)==0 .and. outTrsf(time,node2)==0) then 2083 outTrsf(time,node1) = dir12*node2 2084 outTrsf(time,node2) = -dir12*node1 2085 found = .true. 2086 exit realloc_loop 2087 end if 2088 end do ! iTime 2089 ! If no available time found, increase maxTime 2090 maxTime = maxTime*incrFact + 1 2091 call re_alloc( outTrsf, 1,maxTime, 1,nNodes, & 2092 name=myName//'outTrsf', copy=.true. ) 2093 end do realloc_loop 2094 2095#ifdef DEBUG_XC 2096 if (.not.found) call die(errHead//'parameter incrFact too small') 2097#endif /* DEBUG_XC */ 2098 end do ! iTrsf 2099 2100 ! Copy transfer sequence to output arrays 2101 if (node1==myNode) then 2102 ! Pack transfer sequence, removing idle times for myNode 2103 myTrsf(1:nTrsf) = pack( outTrsf(:,myNode), outTrsf(:,myNode)/=0 ) 2104 ! The -1 is to change back from node range (1:nNodes) to (0:nNodes-1) 2105 trsfNode = abs( myTrsf(1:nTrsf) ) - 1 2106 trsfDir = sign( 1, myTrsf(1:nTrsf) ) 2107#ifdef DEBUG_XC 2108! write(udebug,'(a,20i4)') myName//' inTrsf=', inTrsf(1:nTrsf,myNode) 2109! write(udebug,'(a,20i4)') myName//'outTrsf=', myTrsf(1:nTrsf) 2110#endif /* DEBUG_XC */ 2111 exit ! i1 loop 2112 end if 2113 2114 end do ! i1 2115 2116! Deallocate arrays 2117 call de_alloc( outTrsf, name=myName//'outTrsf' ) 2118 call de_alloc( inTrsf, name=myName//'inTrsf' ) 2119 call de_alloc( myTrsf, name=myName//'myTrsf' ) 2120 2121#endif 2122 2123end subroutine optimizeTransferOrder 2124 2125!------------------------------------------------------------------------------- 2126 2127subroutine primeFactors( n, factor, power, nFactors ) 2128 2129! Finds prime factors of n 2130 2131 implicit none 2132 integer,intent(in) :: n ! Number to be factorized 2133 integer,intent(out):: factor(:) ! Different prime factors in ascending order 2134 integer,intent(out):: power(:) ! Power of each prime factor 2135 integer,intent(out):: nFactors ! Number of different factors found 2136 2137 integer:: i, m 2138 2139 nFactors = 0 2140 m = n 2141 do i = 2,n ! This is crude, but simple! 2142 if (mod(m,i)==0) then ! i is a factor o m 2143 nFactors = nFactors + 1 2144 if (nFactors>size(factor)) & 2145 call die(moduleName//'primeFactors: ERROR: size(factor) too small') 2146 factor(nFactors) = i 2147 power(nFactors) = 0 2148 do ! loop to find power 2149 if (mod(m,i)==0) then 2150 power(nFactors) = power(nFactors) + 1 2151 m = m / i 2152 else 2153 exit ! power loop 2154 end if ! (mod(m,i)==0) 2155 end do ! power loop 2156 end if ! (mod(m,i)==0) 2157 if (m==1) exit ! i loop 2158 end do ! i 2159 2160end subroutine primeFactors 2161 2162!------------------------------------------------------------------------------- 2163 2164subroutine projection( nMesh, srcBox, srcData, prjBox, prjData ) 2165 2166! Projects a box of values, defined in a 3D mesh of a periodic unit cell, 2167! onto each of the three mesh axes. The projected box must be entirely 2168! inside the source box. 2169 2170 implicit none 2171 integer, intent(in) :: nMesh(3) ! Global mesh size in each axis 2172 integer, intent(in) :: srcBox(2,3) ! Source data mesh box: 2173 ! srcBox(1,iAxis)=lower box limits 2174 ! srcBox(2,iAxis)=upper box limits 2175 real(gp),intent(in) :: srcData(0:,0:,0:) ! Source data values to be copied 2176 ! Its upper array bounds must be 2177 ! srcBox(2,iAxis)-srcBox(1,iAxis) 2178 integer, intent(in) :: prjBox(2,3) ! Projection mesh box. Must be 2179 ! prjBox(1,:)>=srcBox(1,:) and 2180 ! prjBox(2,:)<=srcBox(2,:) 2181 real(gp),intent(out):: prjData(:,0:) ! Projected data values: 2182 ! prjData(iAxis,0:prjBox(2,iAxis)-prjBox(1,iAxis)) 2183 2184 character(len=*),parameter:: myName = moduleName//'projection ' 2185 character(len=*),parameter:: errHead = myName//'ERROR: ' 2186 integer:: i1, i1max, i1min, i2, i2max, i2min, i3, i3max, i3min 2187 2188! Check box limits 2189 if (any(prjBox(1,:)<srcBox(1,:)) .or. & 2190 any(prjBox(2,:)>srcBox(2,:)) ) & 2191 call die(errHead//'prjBox outside srcBox') 2192 2193! Check array sizes 2194 if (any( shape(srcData) /= srcBox(2,:)-srcBox(1,:)+1 )) & 2195 call die(errHead//'shape of array srcData inconsistent with srcBox') 2196 if (size(prjData,2) < maxval(prjBox(2,:)-prjBox(1,:)+1)) & 2197 call die(errHead//'size of array prjData too small') 2198 2199! Find prjBox limits relative to srcBox origin 2200 i1min = prjBox(1,1) - srcBox(1,1) 2201 i1max = prjBox(2,1) - srcBox(1,1) 2202 i2min = prjBox(1,2) - srcBox(1,2) 2203 i2max = prjBox(2,2) - srcBox(1,2) 2204 i3min = prjBox(1,3) - srcBox(1,3) 2205 i3max = prjBox(2,3) - srcBox(1,3) 2206 2207! Find projection along each axis 2208 prjData = 0 2209 do i1 = i1min,i1max 2210 prjData(1,i1-i1min) = sum( srcData(i1,i2min:i2max,i3min:i3max) ) 2211 end do 2212 do i2 = i2min,i2max 2213 prjData(2,i2-i2min) = sum( srcData(i1min:i1max,i2,i3min:i3max) ) 2214 end do 2215 do i3 = i3min,i3max 2216 prjData(3,i3-i3min) = sum( srcData(i1min:i1max,i2min:i2max,i3) ) 2217 end do 2218 2219end subroutine projection 2220 2221!------------------------------------------------------------------------------- 2222 2223subroutine projectMeshData( nMesh, srcDistr, srcData, prjBox, prjData, task ) 2224 2225! Projects a box of values, defined in a 3D mesh of a periodic unit cell, 2226! onto each of the three mesh axes. 2227! The box limits may be partly or entirely outside the first unit cell. 2228! In parallel execution, the routine handles the interprocessor transfers 2229! needed to bring the requested box data from the nodes that store them. 2230 2231 implicit none 2232 integer, intent(in) :: nMesh(3) ! Global mesh size in each axis 2233 integer, intent(in) :: srcDistr ! Distribution ID of source data 2234 ! or zero for serial execution 2235 real(gp),intent(in) :: srcData(0:,0:,0:) ! Source data values to be copied 2236 ! Its upper array bounds must be 2237 ! those assigned to the node by 2238 ! the srcDistr parallel distrib. 2239 ! or nMesh(iAxis)-1 in serial exe 2240 integer, intent(in) :: prjBox(2,3) ! Projection mesh box: 2241 ! box(1,iAxis)=lower box limits 2242 ! box(2,iAxis)=upper box limits 2243 ! Box needs not be within the 2244 ! (0:nMesh-1) unit-cell range 2245 real(gp),intent(out):: prjData(:,0:) ! Projected data values: 2246 ! prjData(iAxis,0:prjBox(2,iAxis)-prjBox(1,iAxis)) 2247 integer,intent(inout),optional:: task ! ID of communication task 2248 2249 character(len=*),parameter:: myName = moduleName//'projectMeshData ' 2250 character(len=*),parameter:: errHead = myName//'ERROR: ' 2251 integer :: srcBox(2,3), srcMesh(3) 2252 real(gp),pointer:: mySrcData(:,:,:,:)=>null() 2253 2254! Associate task to distributions 2255 if (present(task)) call associateMeshTask( task, srcDistr ) 2256 2257! Find box limits of source data in this node 2258 call myMeshBox( nMesh, srcDistr, srcBox ) 2259 srcMesh = srcBox(2,:) - srcBox(1,:) + 1 2260 2261! Check array shapes 2262! if (any( shape(srcData) /= srcMesh )) & 2263! call die( errHead//'incorrect shape of srcData array' ) 2264 if (any( shape(srcData) /= srcMesh )) then 2265#ifdef DEBUG_XC 2266 write(udebug,'(a,i6,3(2x,2i4))') & 2267 errHead//'srcDistr,srcBox =', srcDistr, srcBox 2268 write(udebug,'(a,3i6,3x,3i6)') & 2269 errHead//'shape(srcData),srcMesh =', shape(srcData), srcMesh 2270#endif /* DEBUG_XC */ 2271 call die( errHead//'incorrect shape of srcData array' ) 2272 end if 2273 if (size(prjData,2) < maxval(prjBox(2,:)-prjBox(1,:)+1)) & 2274 call die( errHead//'size of prjData array too small' ) 2275 2276! Copy srcData into a rank-4 array to call reduceData 2277 call re_alloc( mySrcData, 0,srcMesh(1)-1, 0,srcMesh(2)-1, 0,srcMesh(3)-1, & 2278 1,1, myName//'mySrcData' ) 2279 mySrcData(0:srcMesh(1)-1,0:srcMesh(2)-1,0:srcMesh(3)-1,1) = & 2280 srcData(0:srcMesh(1)-1,0:srcMesh(2)-1,0:srcMesh(3)-1) 2281 2282! Find projection 2283 call reduceData( nMesh, srcBox, mySrcData, prjBox, prjData=prjData, & 2284 taskID=task ) 2285 2286! Deallocate temporary array 2287 call de_alloc( mySrcData, myName//'mySrcData' ) 2288 2289end subroutine projectMeshData 2290 2291!------------------------------------------------------------------------------- 2292 2293subroutine redistributeMeshData( srcDistr, srcData, dstDistr, dstData, task ) 2294 2295! Copies data values on the mesh with a new mesh distribution among nodes 2296 2297 implicit none 2298 integer, intent(in) :: srcDistr ! Distrib. ID of srcData 2299 real(gp),target,intent(in) :: srcData(0:,0:,0:,:) ! Input source data 2300 integer, intent(in) :: dstDistr ! Distrib. ID of dstData 2301 real(gp),pointer :: dstData(:,:,:,:) ! Output destination array 2302 ! (it must be different from srcData) 2303 integer,intent(inout),optional:: task ! ID of communication task 2304 2305 character(len=*),parameter:: myName = 'redistributeMeshData ' 2306 character(len=*),parameter:: errHead = myName//'ERROR: ' 2307 integer:: dstBox(2,3), dstMesh(3), iDistr, nData, nMesh(3) 2308 integer:: i1, i2, i3, idat ! provisional addition. JMS, June 2013 2309 2310! For serial execution, avoid allocating a new array 2311 if (srcDistr==0 .and. dstDistr==0) then 2312 dstData => srcData 2313 return 2314 else if (srcDistr<=0 .or. dstDistr<=0) then 2315 call die(errHead//'invalid srcDistr or dstDistr') 2316 end if 2317 2318! Find mesh box of node in new distribution 2319 iDistr = indexDistr( dstDistr ) 2320 if (iDistr<=0 .or. iDistr>maxDistr) call die(errHead//'invalid dstDistr') 2321 nMesh = storedMeshDistr(iDistr)%nMesh 2322 call myMeshBox( nMesh, dstDistr, dstBox ) 2323 dstMesh(:) = dstBox(2,:) - dstBox(1,:) + 1 2324 2325! Allocate destination data array 2326 nData = size(srcData,4) 2327 call re_alloc( dstData, 0,dstMesh(1)-1, 0,dstMesh(2)-1, 0,dstMesh(3)-1, & 2328 1,nData, name=myName//'dstData', copy=.false., shrink=.true. ) 2329 2330! Copy srcData to dstData 2331 if ( sameMeshDistr(srcDistr,dstDistr) ) then 2332 ! Provisional loops to circunvent an apparent compiler bug. JMS, June 2013. 2333 do idat = 1,nData 2334 do i3 = 0,dstMesh(3)-1 2335 do i2 = 0,dstMesh(2)-1 2336 do i1 = 0,dstMesh(1)-1 2337 dstData(i1,i2,i3,idat) = srcData(i1,i2,i3,idat) 2338 end do 2339 end do 2340 end do 2341 end do 2342! dstData = srcData ! original code 2343 else 2344 call copyMeshData( nMesh, srcDistr, srcData, dstBox, dstData, task ) 2345 end if 2346 2347end subroutine redistributeMeshData 2348 2349!------------------------------------------------------------------------------- 2350 2351subroutine reduceData( nMesh, srcBox, srcData, dstBox, dstData, prjData, & 2352 taskID ) 2353 2354! Reduces (adds) a box of values defined in a 3D mesh of a periodic unit cell. 2355! The box limits may be entirely inside, or partially or totally outside 2356! the unit cell, which is periodically repeated to fill the box. 2357! In parallel execution, the routine handles the interprocessor transfers 2358! needed to bring the requested box data from the nodes that store them. 2359 2360 implicit none 2361 integer, intent(in) :: nMesh(3) ! Global mesh size in each axis 2362 integer, intent(in) :: srcBox(2,3) ! Source mesh box: 2363 ! box(1,iAxis)=lower box limits 2364 ! box(2,iAxis)=upper box limits 2365 ! Boxes may be larger than the 2366 ! (0:nMesh-1) unit-cell range 2367 real(gp),intent(in) :: srcData(0:,0:,0:,:) ! Source data values to be copied 2368 ! Its upper array bounds must be 2369 ! enough to contain the node box in 2370 ! the srcDistr parallel distrib. 2371 ! or nMesh(iAxis)-1 in serial exe 2372 integer, intent(in) :: dstBox(2,3) ! Destination mesh box 2373 real(gp),intent(inout),optional:: dstData(0:,0:,0:,:) 2374 ! Destination data values. 2375 ! The upper array bounds must be 2376 ! dstBox(2,iAxis)-dst(1,iAxis) 2377 ! dstData is NOT initialized 2378 real(gp),intent(out),optional:: prjData(:,0:) ! Projected data values: 2379 ! prjData(iAxis,0:prjBox(2,iAxis)-prjBox(1,iAxis)) 2380 ! Only srcData(:,:,:,1) is projected 2381 integer,intent(inout),optional:: taskID ! ID of communication task 2382 2383 character(len=*),parameter:: myName = moduleName//'reduceData ' 2384 character(len=*),parameter:: errHead = myName//'ERROR: ' 2385 integer,allocatable:: dstBoxes(:,:,:), srcBoxes(:,:,:), & 2386 trsfDir(:), trsfNode(:) 2387 integer :: comBox(2,3), arrayS(4,2), & 2388 dstComBox(2,3,maxParts), dstNode, dstSize, & 2389 i, ic, ix, iNode, iPart, iTask, iTrsf, & 2390 maxMesh, nData, nParts, nTrsf, partSize, & 2391 sizeSum(0:maxParts), srcComBox(2,3,maxParts), & 2392 srcNode, srcSize, trsfSize, trueTrsf 2393 logical :: taskDefined, taskOptimized 2394 real(gp),allocatable:: prjBuff(:,:) 2395 real(gp),pointer:: trsfBuff(:)=>null() 2396 type(taskType),pointer:: task 2397 2398#ifdef DEBUG_XC 2399! write(udebug,'(a,i4,3(2x,2i4),2x,3(2x,2i4))') & 2400! myName//'myNode+1,srcBox,dstBox=', myNode+1, srcBox, dstBox 2401#endif /* DEBUG_XC */ 2402 2403! Check that one of dstData or prjData is present 2404 if (.not.present(dstData) .and. .not.present(prjData)) return 2405 2406! Check array shapes 2407 nData = size(srcData,4) 2408 do ic = 1,3 2409 if (size(srcData,ic) < srcBox(2,ic)-srcBox(1,ic)+1) & 2410 call die( errHead//'incorrect array shape of srcData' ) 2411 end do 2412 if (present(dstData)) then 2413 do ic = 1,3 2414 if (size(dstData,ic) < dstBox(2,ic)-dstBox(1,ic)+1) & 2415 call die( errHead//'incorrect array shape of dstData' ) 2416 end do 2417 if (size(dstData,4) /= nData) & 2418 call die( errHead//'inconsistent size of srcData and dstData' ) 2419 end if 2420 if (present(prjData)) then 2421 if (size(prjData,2) < maxval(dstBox(2,:)-dstBox(1,:)+1)) & 2422 call die( errHead//'size of prjData array too small' ) 2423 end if 2424 2425! Allocate buffer for projection data transfer 2426 if (present(prjData)) then 2427 maxMesh = maxval(nMesh) 2428 allocate( prjBuff(3,0:maxMesh-1) ) 2429 end if 2430 2431! Find common intersection between srcBox and dstBox 2432 call commonBox( nMesh, srcBox, dstBox, srcComBox, dstComBox, sizeSum, nParts ) 2433 2434! Copy local data from each part of source box 2435 if (present(dstData)) then 2436 do iPart = 1,nParts 2437 dstData(dstComBox(1,1,iPart):dstComBox(2,1,iPart), & 2438 dstComBox(1,2,iPart):dstComBox(2,2,iPart), & 2439 dstComBox(1,3,iPart):dstComBox(2,3,iPart),:) = & 2440 dstData(dstComBox(1,1,iPart):dstComBox(2,1,iPart), & 2441 dstComBox(1,2,iPart):dstComBox(2,2,iPart), & 2442 dstComBox(1,3,iPart):dstComBox(2,3,iPart),:) + & 2443 srcData(srcComBox(1,1,iPart):srcComBox(2,1,iPart), & 2444 srcComBox(1,2,iPart):srcComBox(2,2,iPart), & 2445 srcComBox(1,3,iPart):srcComBox(2,3,iPart),:) 2446 end do ! iPart 2447 end if ! (present(dstData)) 2448 2449! Project local data onto axes of projection box 2450 if (present(prjData)) then 2451 prjData(:,:) = 0 2452 do iPart = 1,nParts 2453 ! Find common box relative to origin of unit cell 2454 comBox(1,:) = srcComBox(1,:,iPart) + srcBox(1,:) 2455 comBox(2,:) = srcComBox(2,:,iPart) + srcBox(1,:) 2456 ! Find projection of srcData in comBox 2457 call projection( nMesh, srcBox, srcData(:,:,:,1), comBox, prjBuff ) 2458 ! Accumulate projection onto output array 2459 do ix = 1,3 2460 prjData(ix,dstComBox(1,ix,iPart):dstComBox(2,ix,iPart)) = & 2461 prjData(ix,dstComBox(1,ix,iPart):dstComBox(2,ix,iPart)) + & 2462 prjBuff(ix,0:comBox(2,ix)-comBox(1,ix)) 2463 end do 2464 end do ! iPart 2465 end if ! (present(prjData)) 2466 2467! If totNodes=1 => serial case (all data are local) => all is done 2468 if (totNodes==1) return 2469 2470! Else we need MPI code 2471#ifdef HAVE_MPI 2472 2473! Find the stored data about this communication task 2474 taskDefined = .false. 2475 taskOptimized = .false. 2476 if (present(taskID)) then 2477 iTask = indexTask( taskID ) 2478 if (iTask<0) then ! Create a new task 2479 call initTask( taskID ) 2480 iTask = indexTask( taskID ) 2481 end if ! (iTask<0) 2482 task => storedMeshTask(iTask) 2483 taskDefined = task%defined 2484 taskOptimized = task%optimized 2485 end if ! (present(task)) 2486 2487! Check task boxes 2488 if (taskDefined) then 2489 if (any( task%srcBox(:,:,myNode) /= srcBox )) & 2490 call die( errHead//'srcBox inconsistent with taskID' ) 2491 if (any( task%dstBox(:,:,myNode) /= dstBox )) & 2492 call die( errHead//'dstBox inconsistent with taskID' ) 2493 end if 2494 2495! Find the box limits of all nodes 2496 allocate( srcBoxes(2,3,0:totNodes-1), dstBoxes(2,3,0:totNodes-1) ) 2497 if (taskDefined) then 2498 srcBoxes(:,:,:) = task%srcBox(:,:,:) 2499 dstBoxes(:,:,:) = task%dstBox(:,:,:) 2500 else ! Gather the boxes from all nodes 2501 call gatherBoxes( srcBox, srcBoxes ) 2502 call gatherBoxes( dstBox, dstBoxes ) 2503 if (present(taskID)) then ! Store boxes for future calls 2504 task%srcBox(:,:,:) = srcBoxes(:,:,:) 2505 task%dstBox(:,:,:) = dstBoxes(:,:,:) 2506 task%defined = .true. 2507#ifdef DEBUG_XC 2508! write(udebug,'(a,2i4,3(2x,2i4))') & 2509! myName//'taskID,iTask,srcBox=', taskID, iTask, srcBox 2510! write(udebug,'(31x,a,2i4,3(2x,2i4))') & 2511! 'dstBox=', taskID, iTask, dstBox 2512#endif /* DEBUG_XC */ 2513 end if 2514 end if 2515 2516#ifdef DEBUG_XC 2517! do iNode = 0,totNodes-1 2518! call nodeMeshBox( nMesh, srcDistr, iNode, srcBox ) 2519! write(udebug,'(a,i4,2x,3(2i4,2x),3(2i4,2x))') & 2520! myName//'node,srcBox,dstBox=', & 2521! iNode, srcBoxes(:,:,iNode), dstBoxes(:,:,iNode) 2522! end do 2523#endif /* DEBUG_XC */ 2524 2525! Find the order of inter-node transfers 2526 nTrsf = 2*(totNodes-1) ! One send and one receive to/from each other node 2527 allocate( trsfDir(0:nTrsf), trsfNode(0:nTrsf) ) 2528 if (taskOptimized) then 2529 nTrsf = task%nTrsf 2530 trsfDir(:) = task%trsfDir(:) 2531 trsfNode(:) = task%trsfNode(:) 2532 else ! (.not.taskOptimized) 2533 call all2allTransferOrder( totNodes, myNode, nTrsf, trsfNode, trsfDir ) 2534 end if ! (taskOptimized) 2535 2536#ifdef DEBUG_XC 2537! write(udebug,'(a,i4,2x,20i4)') myName//'myNode+1,Trsfs=', & 2538! myNode+1, trsfDir(1:nTrsf)*(trsfNode(1:nTrsf)+1) 2539#endif /* DEBUG_XC */ 2540 2541! Find required size of buffer to transfer data 2542 trsfSize = 0 2543 if (present(dstData)) then 2544 do iTrsf = 1,nTrsf 2545 iNode = trsfNode(iTrsf) 2546 if (trsfDir(iTrsf)==+1) then ! Size required to send 2547 srcSize = product( srcBox(2,:)-srcBox(1,:)+1 ) 2548 dstSize = product( dstBoxes(2,:,iNode)-dstBoxes(1,:,iNode)+1 ) 2549 trsfSize = max( trsfSize, min(srcSize,dstSize) ) 2550 else ! Size required to receive 2551 srcSize = product( srcBoxes(2,:,iNode)-srcBoxes(1,:,iNode)+1 ) 2552 dstSize = product( dstBox(2,:)-dstBox(1,:)+1 ) 2553 trsfSize = max( trsfSize, min(srcSize,dstSize) ) 2554 end if ! (trsfDir(iTrsf)==+1) 2555 end do ! iTrsf 2556 trsfSize = trsfSize*nData 2557 end if 2558 if (present(prjData)) trsfSize = trsfSize + 3*maxMesh*maxParts 2559 2560! Allocate buffer to transfer data 2561 call re_alloc( trsfBuff, 1, trsfSize, name=myName//'trsfBuff' ) 2562 2563! Loop on transfer sequence 2564 trueTrsf = 0 2565 do iTrsf = 1,nTrsf 2566 2567 if (trsfDir(iTrsf)==+1) then ! My node will send data 2568 2569 ! Find destination node 2570 dstNode = trsfNode(iTrsf) 2571 2572 ! Find common intersection between srcBox and dstBox 2573 call commonBox( nMesh, srcBox, dstBoxes(:,:,dstNode), & 2574 srcComBox, dstComBox, sizeSum, nParts ) 2575 sizeSum = sizeSum * nData 2576 2577 ! Select this transfer only if it is needed 2578 if (nParts>0) then 2579 trueTrsf = trueTrsf + 1 2580 trsfDir(trueTrsf) = trsfDir(iTrsf) 2581 trsfNode(trueTrsf) = trsfNode(iTrsf) 2582 else ! (nParts==0) 2583 cycle ! iTrsf loop 2584 end if ! (nParts>0) 2585 2586#ifdef DEBUG_XC 2587 ! Check buffer size 2588 if (present(dstData) .and. sizeSum(nParts) > size(trsfBuff)) & 2589 call die(errHead//'size(trsfBuff) too small') 2590#endif /* DEBUG_XC */ 2591 2592#ifdef DEBUG_XC 2593! if (nParts>0) then 2594! write(udebug,'(a,2i6)') myName//'srcNode,dstNode=', myNode, dstNode 2595! write(udebug,'(a,2i6)') myName//' sizeSum=',sizeSum(0:nParts) 2596! write(udebug,'(a,3(2i4,2x))') myName//' srcBox=', srcBox 2597! write(udebug,'(a,3(2i4,2x))') myName//' dstBox=', & 2598! dstBoxes(:,:,dstNode) 2599! end if 2600 2601! do iPart = 1,nParts 2602! write(udebug,'(a,3(2i4,2x))') myName//'srcComBox=', & 2603! ((srcComBox(i,ix,iPart),i=1,2),ix=1,3) 2604! write(udebug,'(a,3(2i4,2x))') myName//'dstComBox=', & 2605! ((dstComBox(i,ix,iPart),i=1,2),ix=1,3) 2606! end do ! iPart 2607#endif /* DEBUG_XC */ 2608 2609 ! Copy data from each part of common box to a transfer buffer 2610 if (present(dstData)) then 2611 do iPart = 1,nParts 2612 !partSize = sizeSum(iPart) - sizeSum(iPart-1) 2613 arrayS(1:3,1) = srcComBox(1,:,iPart) + 1 2614 arrayS(4,1) = 1 2615 arrayS(1:3,2) = srcComBox(2,:,iPart) + 1 2616 arrayS(4,2) = nData 2617 call array_copy(arrayS(:,1), arrayS(:,2), srcData(:,:,:,:), & 2618 sizeSum(iPart-1)+1, sizeSum(iPart), trsfBuff(:) ) 2619 !trsfBuff(sizeSum(iPart-1)+1:sizeSum(iPart)) = & 2620 ! reshape( srcData(srcComBox(1,1,iPart):srcComBox(2,1,iPart), & 2621 ! srcComBox(1,2,iPart):srcComBox(2,2,iPart), & 2622 ! srcComBox(1,3,iPart):srcComBox(2,3,iPart),:), & 2623 ! (/partSize/) ) 2624 end do ! iPart 2625 trsfSize = sizeSum(nParts) 2626 else ! (.not.present(dstData)) 2627 trsfSize = 0 2628 end if ! (present(dstData)) 2629 2630 ! Project data from each part of common box onto the transfer buffer 2631 if (present(prjData)) then 2632 do iPart = 1,nParts 2633 ! Find common box relative to origin of first unit cell 2634 comBox(1,:) = srcComBox(1,:,iPart) + srcBox(1,:) 2635 comBox(2,:) = srcComBox(2,:,iPart) + srcBox(1,:) 2636 ! Find projection of srcData in comBox 2637 call projection( nMesh, srcBox, srcData(:,:,:,1), comBox, prjBuff ) 2638 ! Copy projection data onto a transfer buffer 2639 call array_copy([1, 1], [3, maxMesh], prjBuff(:,:), & 2640 trsfSize+1, trsfSize+3*maxMesh, trsfBuff(:)) 2641 ! AG: Work around gfortran bug: specify 0 lbound in reshape 2642 !trsfBuff(trsfSize+1:trsfSize+3*maxMesh) = & 2643 ! reshape( prjBuff(:,0:), (/3*maxMesh/) ) 2644 trsfSize = trsfSize + 3*maxMesh 2645 end do ! iPart 2646 end if ! (present(prjData)) 2647 2648#ifdef DEBUG_XC 2649! write(udebug,'(a,2i4,3(2x,2i4),2x,3(2x,2i4),i8)') & 2650! myName//' myNode,dstNode, myBox,dstBox,trsfSize=', & 2651! myNode+1, dstNode+1, srcBox, dstBoxes(:,:,dstNode), trsfSize 2652#endif /* DEBUG_XC */ 2653 2654 ! Send data buffer 2655 if (trsfSize>0) & 2656 call MPI_Send( trsfBuff(1:trsfSize), trsfSize, MPI_grid_real, & 2657 dstNode, 0, COMM, MPIerror ) 2658 2659 else ! My node will receive data 2660 2661 ! Find source node and its mesh box 2662 srcNode = trsfNode(iTrsf) 2663 2664 ! Find common intersection between srcBox and dstBox 2665 call commonBox( nMesh, srcBoxes(:,:,srcNode), dstBox, & 2666 srcComBox, dstComBox, sizeSum, nParts ) 2667 sizeSum = sizeSum * nData 2668 2669 ! Select this transfer only if it is needed 2670 if (nParts>0) then 2671 trueTrsf = trueTrsf + 1 2672 trsfDir(trueTrsf) = trsfDir(iTrsf) 2673 trsfNode(trueTrsf) = trsfNode(iTrsf) 2674 else ! (nParts==0) 2675 cycle ! iTrsf loop 2676 end if ! (nParts>0) 2677 2678#ifdef DEBUG_XC 2679 ! Check buffer size 2680 if (present(dstData) .and. sizeSum(nParts)>size(trsfBuff)) & 2681 call die(errHead//'size(trsfBuff) too small') 2682#endif /* DEBUG_XC */ 2683 2684#ifdef DEBUG_XC 2685! if (nParts>0) then 2686! write(udebug,'(a,2i6)') myName//'srcNode,dstNode=', srcNode, myNode 2687! write(udebug,'(a,2i6)') myName//' sizeSum=',sizeSum(0:nParts) 2688! write(udebug,'(a,3(2i4,2x))') myName//' srcBox=', srcBox 2689! write(udebug,'(a,3(2i4,2x))') myName//' dstBox=', dstBox 2690! end if 2691 2692! do iPart = 1,nParts 2693! write(udebug,'(a,3(2i4,2x))') myName//'srcComBox=', & 2694! ((srcComBox(i,ix,iPart),i=1,2),ix=1,3) 2695! write(udebug,'(a,3(2i4,2x))') myName//'dstComBox=', & 2696! ((dstComBox(i,ix,iPart),i=1,2),ix=1,3) 2697! end do ! iPart 2698#endif /* DEBUG_XC */ 2699 2700 ! Receive data buffer 2701 trsfSize = 0 2702 if (present(dstData)) trsfSize = trsfSize + sizeSum(nParts) 2703 if (present(prjData)) trsfSize = trsfSize + 3*maxMesh*nParts 2704 if (trsfSize>0) & 2705 call MPI_Recv( trsfBuff(1:trsfSize), trsfSize, MPI_grid_real, & 2706 srcNode, 0, COMM, MPIstatus, MPIerror ) 2707 2708#ifdef DEBUG_XC 2709! write(udebug,'(a,2i4,3(2x,2i4),2x,3(2x,2i4),i8)') & 2710! myName//'srcNode, myNode,srcBox, myBox,trsfSize=', & 2711! srcNode+1, myNode+1, srcBoxes(:,:,srcNode), dstBox, trsfSize 2712#endif /* DEBUG_XC */ 2713 2714 ! Copy buffer data for each part of common box to destination array 2715 if (present(dstData)) then 2716 do iPart = 1,nParts 2717 arrayS(1:3,1) = dstComBox(1,:,iPart) + 1 2718 arrayS(4,1) = 1 2719 arrayS(1:3,2) = dstComBox(2,:,iPart) + 1 2720 arrayS(4,2) = nData 2721 call array_add(sizeSum(iPart-1)+1, sizeSum(iPart), trsfBuff(:), & 2722 arrayS(:,1), arrayS(:,2), dstData(:,:,:,:)) 2723 !comShape(1:3) = dstComBox(2,:,iPart) - dstComBox(1,:,iPart) + 1 2724 !comShape(4) = nData 2725 !dstData(dstComBox(1,1,iPart):dstComBox(2,1,iPart), & 2726 ! dstComBox(1,2,iPart):dstComBox(2,2,iPart), & 2727 ! dstComBox(1,3,iPart):dstComBox(2,3,iPart),:) = & 2728 !dstData(dstComBox(1,1,iPart):dstComBox(2,1,iPart), & 2729 ! dstComBox(1,2,iPart):dstComBox(2,2,iPart), & 2730 ! dstComBox(1,3,iPart):dstComBox(2,3,iPart),:) + & 2731 ! reshape( trsfBuff(sizeSum(iPart-1)+1:sizeSum(iPart)), comShape ) 2732 end do ! iPart 2733 trsfSize = sizeSum(nParts) 2734 else ! (.not.present(dstData)) 2735 trsfSize = 0 2736 end if ! (present(dstData)) 2737 2738 ! Copy buffer data for each part of common box to destination array 2739 if (present(prjData)) then 2740 do iPart = 1,nParts 2741 ! Find common box relative to origin of first unit cell 2742 comBox(1,:) = srcComBox(1,:,iPart) + srcBox(1,:) 2743 comBox(2,:) = srcComBox(2,:,iPart) + srcBox(1,:) 2744 ! Copy transfer buffer onto a projection buffer 2745 call array_copy(trsfSize+1, trsfSize+3*maxMesh, trsfBuff(:), & 2746 [1, 1], [3, maxMesh], prjBuff(:,:)) 2747 !prjBuff(:,:) = & 2748 ! reshape( trsfBuff(trsfSize+1:trsfSize+3*maxMesh), (/3,maxMesh/) ) 2749 ! Add projection of common box to output array 2750 do ix = 1,3 2751 prjData(ix,dstComBox(1,ix,iPart):dstComBox(2,ix,iPart)) = & 2752 prjData(ix,dstComBox(1,ix,iPart):dstComBox(2,ix,iPart)) + & 2753 prjBuff(ix,0:comBox(2,ix)-comBox(1,ix)) 2754 end do 2755 trsfSize = trsfSize + 3*maxMesh 2756 end do ! iPart 2757 end if ! (present(prjData)) 2758 2759 end if ! (iDir(iTrsf)==+1) 2760 2761 end do ! iTrsf 2762 2763 ! Optimize transfer sequence and store it 2764 if (present(taskID) .and. .not.taskOptimized) then 2765 nTrsf = trueTrsf 2766 call optimizeTransferOrder( totNodes, myNode, nTrsf, & 2767 trsfNode(1:nTrsf), trsfDir(1:nTrsf) ) 2768 task%nTrsf = nTrsf 2769 task%trsfNode = trsfNode 2770 task%trsfDir = trsfDir 2771 task%optimized = .true. 2772#ifdef DEBUG_XC 2773! write(udebug,'(a,4i4,/,(20i4))') & 2774! myName//'taskID,iTask,distr1,distr2,Trsfs=', & 2775! taskID, iTask, task%distr, trsfDir(1:nTrsf)*trsfNode(1:nTrsf) 2776#endif /* DEBUG_XC */ 2777 end if 2778 2779 ! Deallocations 2780 call de_alloc( trsfBuff, name=myName//'trsfBuff' ) 2781 deallocate( trsfDir, trsfNode, dstBoxes, srcBoxes ) 2782 2783#endif 2784! End of MPI code 2785 2786 if (present(prjData)) deallocate( prjBuff ) 2787 2788end subroutine reduceData 2789 2790!------------------------------------------------------------------------------- 2791 2792subroutine reduceDistr( distrID ) 2793 2794! Find if this distribution was already defined, and resets it if so 2795 2796 implicit none 2797 integer,intent(inout):: distrID ! Distribution ID 2798 2799 character(len=*),parameter:: myName = moduleName//'reduceDistr ' 2800 character(len=*),parameter:: errHead = myName//'ERROR: ' 2801 logical:: found 2802 integer:: iDistr, iID, jDistr 2803 type(distrType),pointer:: newDistr, oldDistr 2804 2805! Find the new distribution 2806 iDistr = indexDistr( distrID ) 2807 if (iDistr==0) then 2808 return 2809 else if (iDistr<0 .or. iDistr>maxDistr) then 2810 call die(errHead//'invalid distrID') 2811 end if 2812 newDistr => storedMeshDistr(iDistr) 2813 2814! Compare it with all previous distributions 2815 do jDistr = 1,maxDistr 2816 2817 ! Check that everything agrees, otherwise cycle 2818 if (jDistr==iDistr) cycle ! jDistr loop 2819 oldDistr => storedMeshDistr(jDistr) 2820 if (.not.oldDistr%defined) cycle 2821 if (oldDistr%nNodes/=newDistr%nNodes) cycle 2822 if (oldDistr%firstNode/=newDistr%firstNode) cycle 2823 if (any(oldDistr%nMesh(:)/=newDistr%nMesh(:))) cycle 2824 if (any(oldDistr%box(:,:,:)/=newDistr%box(:,:,:))) cycle 2825 ! If this point is reached, both distributions are equal. 2826 2827#ifdef DEBUG_XC 2828! write(udebug,'(a,3i4)') & 2829! myName//'distrID,oldDistr,newDistr=', distrID, jDistr, iDistr 2830#endif /* DEBUG_XC */ 2831 2832 ! Free new distribution 2833 call freeMeshDistr( distrID ) 2834 2835 ! Assign new ID to oldDistr 2836 found = .false. 2837 do iID = 1,maxDistrID ! Loop on ID slots of oldDistr 2838 if (oldDistr%ID(iID)<0) then ! Empty ID slot 2839 oldDistr%ID(iID) = distrID 2840 found = .true. 2841 exit ! iID loop 2842 end if 2843 end do ! iID 2844 if (.not.found) call die(errHead//'parameter maxDistrID too small') 2845 2846 exit ! jDistr loop 2847 end do ! jDistr 2848 2849end subroutine reduceDistr 2850 2851!------------------------------------------------------------------------------- 2852 2853logical function sameMeshDistr( ID1, ID2 ) 2854 2855 implicit none 2856 integer,intent(in):: ID1, ID2 ! Mesh distribution IDs 2857 2858 integer:: i1, i2 2859 type(distrType),pointer:: distr1, distr2 2860 2861 if (ID1==ID2) then ! Same IDs => same distr. 2862 sameMeshDistr = .true. 2863 else ! Different IDs 2864 i1 = indexDistr(ID1) 2865 i2 = indexDistr(ID2) 2866 if (i1<0 .or. i2<0) then ! One or both distr. are not defined 2867 sameMeshDistr = .false. 2868 else ! Both distr. are defined 2869 if (i1==i2) then ! Different IDs but same distr. 2870 sameMeshDistr = .true. 2871 else if (i1==0 .or. i2==0) then ! Since 0 is valid only in serial mode 2872 sameMeshDistr = .false. 2873 else ! Different but possibly equivalent distr. 2874 distr1 => storedMeshDistr(i1) 2875 distr2 => storedMeshDistr(i2) 2876 if (all(distr1%box==distr2%box)) then 2877 sameMeshDistr = .true. 2878 else 2879 sameMeshDistr = .false. 2880 end if ! (all(distr1%box==distr2%box)) 2881 end if ! (i1==i2) 2882 end if ! (i1>0 .and. i2>0) 2883 end if ! (ID1==ID2) 2884 2885end function sameMeshDistr 2886 2887!------------------------------------------------------------------------------- 2888 2889subroutine setMeshDistr( distrID, nMesh, box, firstNode, nNodes, & 2890 nNodesX, nNodesY, nNodesZ, nBlock, & 2891 wlDistr, workload ) 2892 2893! Defines a parallel distribution of mesh points among processor nodes. 2894! In serial execution (totNodes==1) it simply returns distrID=0. 2895! If the input distribution ID is still valid, the same value is returned. 2896! Otherwise, it is freed. New IDs are never repeated. 2897 2898 implicit none 2899 integer, intent(inout):: distrID ! ID assigned to the mesh distrib. 2900 integer, intent(in) :: nMesh(3) ! Mesh divisions in each axis 2901 ! (including "subpoints") 2902 integer,optional,intent(in) :: box(2,3) ! Mesh box of my processor node 2903 ! If present(box), all other 2904 ! optional arguments are ignored 2905 integer,optional,intent(in) :: firstNode ! First node in the mesh distr. 2906 integer,optional,intent(in) :: nNodes ! Total nodes in the mesh distr 2907 ! If not present all nodes are used 2908 integer,optional,intent(in) :: nNodesX ! Nodes in the X (first) axis. Must 2909 ! be present if present(nNodesYZ) 2910 integer,optional,intent(in) :: nNodesY ! Nodes in the Y (second) axis 2911 ! Must be present if present(nNodesZ) 2912 integer,optional,intent(in) :: nNodesZ ! Nodes in the Z (third) axis 2913 ! If nNodesXYZ are not present, 2914 ! the distribution of nodes on all 2915 ! axes is optimized by setMeshDistr. 2916 ! If only nNodesX is present, it is 2917 ! optimized in Y and Z axes 2918 integer,optional,intent(in) :: nBlock ! Size of blocks of mesh points, in 2919 ! each axis, which are not splitted 2920 ! in different nodes 2921 integer,optional,intent(in) :: wlDistr ! Distr. index of workload array 2922 real(gp),optional,intent(in):: workload(0:,0:,0:) ! Approx. relative workload 2923 ! of mesh points. Must be nonnegative 2924 ! at all points and have nonzero sum 2925 2926 character(len=*),parameter:: myName = 'setMeshDistr ' 2927 character(len=*),parameter:: errHead = myName//'ERROR: ' 2928 integer, parameter:: maxFactors = 100 ! Max prime factors in nNodes 2929 type(distrType),pointer:: distr, newDistr, oldDistr 2930 integer,allocatable:: axisBox(:,:,:), partBox(:,:,:) 2931 integer:: axis, axisNodes(3), blockSize, boxSize, & 2932 factor(maxFactors), groupSize, & 2933 i1, i2, i3, iAxis, iBox, iDistr, iFac, iID, iNode, iPow, & 2934 jDistr, largerBoxSize, lastNode, & 2935 maxFactor, meshNodes, myGroup, myBox(2,3), myPart, & 2936 nFactors, node0, nParts, nRem, nBlocks, newDistrID, & 2937 oldDistrID, partSize, power(maxFactors) 2938 logical:: found 2939 character(len=5):: dat 2940 2941! Trap the serial case 2942 if (totNodes==1) then 2943 distrID = 0 2944 return 2945 end if 2946 2947! Find number of mesh nodes 2948 if (present(nNodes)) then 2949 if (nNodes<1 .or. nNodes>totNodes) then 2950 call die(errHead//'nNodes out of range') 2951 else 2952 meshNodes = nNodes 2953 end if 2954 else ! Assume that all nodes participate in the mesh distribution 2955 meshNodes = totNodes 2956 end if 2957 2958! Find first node in mesh distribution 2959 if (present(firstNode)) then 2960 lastNode = firstNode + meshNodes - 1 2961 if (firstNode<0 .or. lastNode>totNodes-1) then 2962 call die(errHead//'firstNode/nNodes out of range') 2963 else 2964 node0 = firstNode 2965 end if 2966 else ! (.not.present(firstNode)) => use firstNode=0 2967 node0 = 0 2968 end if 2969 2970! Find size of blocks of points and check that nMesh is a multiple of it 2971 if (present(nBlock)) then 2972 blockSize = nBlock 2973 else 2974 blockSize = 1 2975 end if 2976 if (any(mod(nMesh,blockSize)/=0)) & 2977 call die(errHead//'nMesh inconsistent with nBlock') 2978 2979! Store input distribution ID for later comparison 2980 oldDistrID = distrID 2981 2982! Initialize new distribution 2983 call initDistr( newDistrID, nMesh, node0, meshNodes ) 2984 iDistr = indexDistr( newDistrID ) 2985 distr => storedMeshDistr(iDistr) ! Just a shorter name 2986 2987! Handle box argument with priority 2988 if (present(box)) then 2989 ! Collect all node boxes and store them 2990 call gatherBoxes( box, distr%box ) 2991 goto 999 ! Exit, since no other arguments must be considered in this case 2992 end if ! (present(box)) 2993 2994! Find box sizes 2995 if (present(workload)) then 2996 2997 ! Check that the distribution of workload array is available 2998 if (.not.present(wlDistr)) call die(errHead//'workload without wlDistr') 2999 3000 ! Check that myNode (available from module parallel) contains some points 3001 if (myNode>=node0 .and. myNode<node0+meshNodes) then 3002 3003 ! Find prime factors of meshNodes, in ascending order 3004 call primeFactors( meshNodes, factor, power, nFactors ) 3005 3006 ! Allocate a small temporary array 3007 maxFactor = maxval( factor(1:nFactors) ) 3008 allocate( partBox(2,3,0:maxFactor-1) ) 3009 3010 ! Iteratively divide full box and select my node's part 3011 myBox(1,:) = 0 ! Initialize all boxes with full box 3012 myBox(2,:) = nMesh(:)-1 3013 groupSize = meshNodes ! Group = set of nodes assigned to present box 3014 do iFac = nFactors,1,-1 ! Use larger factors first (for larger boxes) 3015 do iPow = 1,power(iFac) 3016 nParts = factor(iFac) ! Number of parts to divide present box 3017 partSize = groupSize / nParts ! Nodes that will be assigned to part 3018 myGroup = (myNode - node0) / groupSize ! My group for present box 3019 myPart = (myNode-node0-myGroup*groupSize) / partSize ! My part within 3020 ! my group 3021 axis = 0 ! So that it will be chosen by divideBox3D 3022#ifdef DEBUG_XC 3023! write(udebug,'(a,3(2x,2i4))') myName//'dividing box=', myBox 3024#endif /* DEBUG_XC */ 3025 call divideBox3D( nMesh, wlDistr, workload, myBox, & 3026 nParts, blockSize, axis, partBox ) 3027 myBox(:,:) = partBox(:,:,myPart) ! Select my part of the box 3028 groupSize = partSize ! Reduce group size 3029#ifdef DEBUG_XC 3030! write(udebug,'(a,3(2x,2i4))') myName//' my part box=', myBox 3031#endif /* DEBUG_XC */ 3032 end do ! iPow 3033 end do ! iFac 3034 3035 deallocate( partBox ) 3036 3037 else ! my node contains no mesh points 3038 myBox(1,:) = 0 3039 myBox(2,:) = -1 3040 end if ! (myNode>=node0 .and. myNode<node0+meshNodes) 3041 3042 ! Gather the box limits of all nodes 3043 call gatherBoxes( myBox, distr%box ) 3044 3045 else ! (.not.present(workload)) => distribute nodes homogeneously in space 3046 3047 ! Find distribution of nodes along each axis 3048 if (present(nNodesX)) then 3049 axisNodes(1) = nNodesX 3050 if (present(nNodesY)) then 3051 axisNodes(2) = nNodesY 3052 if (present(nNodesZ)) then 3053 axisNodes(3) = nNodesZ 3054 else 3055 axisNodes(3) = meshNodes / (nNodesX*nNodesY) 3056 end if 3057 else if (present(nNodesZ)) then 3058 call die(errHead//'nNodesZ present without nNodesY') 3059 else ! (.not.present(nNodesYZ)) => optimize distribution in Y and Z axes 3060 call optimizeNodeDistr( nMesh(2:3), meshNodes/nNodesX, axisNodes(2:3) ) 3061 end if ! (present(nNodesY)) 3062 else if (present(nNodesY).or.present(nNodesZ)) then 3063 call die(errHead//'nNodesY or nNodesZ present without nNodesX') 3064 else ! (.not.present(nNodesXYZ)) => optimize distribution in all three axes 3065 call optimizeNodeDistr( nMesh(1:3), meshNodes, axisNodes(1:3) ) 3066 end if ! (present(nNodesX)) 3067 3068 ! Adjust number of nodes to that along the axes 3069 if (product(axisNodes) > meshNodes) then 3070 call die(errHead//'inconsistent nNodesXYZ') 3071 else 3072 meshNodes = product(axisNodes) 3073 end if 3074 3075 ! Allocate a small temporary array 3076 allocate( axisBox(2,3,-1:maxval(axisNodes)-1) ) 3077 3078 ! Find mesh partition along each axis 3079 do iAxis = 1,3 3080 nBlocks = nMesh(iAxis) / blockSize ! Blocks along axis 3081 boxSize = nBlocks / axisNodes(iAxis) ! Blocks per node 3082 nRem = nBlocks - boxSize*axisNodes(iAxis) ! Remaining blocks 3083 largerBoxSize = (boxSize+1)*blockSize ! Mesh points per node 3084 boxSize = boxSize*blockSize ! Mesh points per node 3085 axisBox(2,iAxis,-1) = -1 ! So that axisBox(1,ix,0)=0 3086 do iBox = 0,axisNodes(iAxis)-1 3087 axisBox(1,iAxis,iBox) = axisBox(2,iAxis,iBox-1) + 1 3088 if (iBox < nRem) then ! First nRem boxes of larger size 3089 axisBox(2,iAxis,iBox) = axisBox(2,iAxis,iBox-1) + largerBoxSize 3090 else ! Rest of boxes of normal size 3091 axisBox(2,iAxis,iBox) = axisBox(2,iAxis,iBox-1) + boxSize 3092 end if 3093 end do ! iBox 3094 end do ! iAxis 3095 3096 ! Find box bounds of each node 3097 ! Innermost loop is i3 for compatibility with JDG distribution 3098 iNode = node0 3099 do i1 = 0,axisNodes(1)-1 3100 do i2 = 0,axisNodes(2)-1 3101 do i3 = 0,axisNodes(3)-1 3102 distr%box(:,1,iNode) = axisBox(:,1,i1) 3103 distr%box(:,2,iNode) = axisBox(:,2,i2) 3104 distr%box(:,3,iNode) = axisBox(:,3,i3) 3105 iNode = iNode + 1 3106 end do ! i3 3107 end do ! i2 3108 end do ! i1 3109 3110 ! Deallocate temporary array 3111 deallocate( axisBox ) 3112 3113 end if ! (present(workload)) 3114 3115! Exit point 3116999 continue 3117 3118! Find if the old (input) distrID is still valid 3119 if (sameMeshDistr(oldDistrID,newDistrID)) then ! Old distr. is still valid 3120 call freeMeshDistr( newDistrID ) 3121 distrID = oldDistrID 3122 else ! Old distribution is no longer valid 3123 call freeMeshDistr( oldDistrID ) 3124 ! Reset newDistrID if this distribution was already defined 3125 call reduceDistr( newDistrID ) 3126 distrID = newDistrID 3127#ifdef DEBUG_XC 3128 iDistr = indexDistr( distrID ) 3129 distr => storedMeshDistr(iDistr) 3130 if (present(box)) then 3131 dat = 'box' 3132 elseif (present(workload)) then 3133 dat = 'wkld' 3134 elseif (present(nNodesX)) then 3135 dat = 'xNod' 3136 else 3137 dat = 'none' 3138 endif 3139 write(udebug,'(a,a6,3i4,3(2x,2i4))') & 3140 myName//'dat,old/newDistrID,iDistr,myBox=', & 3141 dat, oldDistrID, newDistrID, iDistr, distr%Box(:,:,myNode) 3142#endif /* DEBUG_XC */ 3143 end if 3144 3145end subroutine setMeshDistr 3146 3147!------------------------------------------------------------------------------- 3148 3149subroutine unitCellParts( nMesh, fullBox, partBox, partBox0, nParts ) 3150 3151! Divides a mesh box into parts, each within a single periodic cell 3152! and returns the equivalent partial boxes within the first unit cell 3153 3154 implicit none 3155 integer,intent(in) :: nMesh(3) ! Number of mesh points in each axis 3156 ! of the periodic unit cell 3157 integer,intent(in) :: fullBox(2,3) ! Box to be partitioned, possibly 3158 ! extending over several unit cells 3159 integer,intent(out):: partBox(:,:,:) ! Partitioned boxes, each within a 3160 ! single unit cell: 3161 ! box(1,iAxis,iPart)=lower bounds 3162 ! box(2,iAxis,iPart)=upper bounds 3163 integer,intent(out):: partBox0(:,:,:) ! Equivalent part boxes in the first 3164 ! unit cell 3165 integer,intent(out):: nParts ! Number of parts found 3166 3167 character(len=*),parameter:: myName = moduleName//'unitCellParts ' 3168 character(len=*),parameter:: errHead = myName//'ERROR: ' 3169 integer:: iDiv, iPart, ix, maxParts 3170 real(dp):: rMesh(3) 3171 3172 maxParts = size(partBox,3) 3173 rMesh = nMesh ! Real version of nMesh, so that partBox/rMesh is real 3174 partBox(:,:,1) = fullBox 3175 nParts = 1 3176 divideLoop: do ! Until no part can be further divided 3177 do iPart = 1,nParts ! Loop on already existing parts 3178 do ix = 1,3 ! Axis loop 3179 ! iDiv is the highest surface, separating two unit cells, below the 3180 ! upper bound of partBox 3181 iDiv = floor(partBox(2,ix,iPart)/rMesh(ix)) * nMesh(ix) 3182 if (partBox(1,ix,iPart) < iDiv) then ! iDiv is between lower and 3183 ! upper bounds of partBox 3184 nParts = nParts + 1 3185 if (nParts > maxParts) call die(errHead//'maxParts too small') 3186 partBox(:,:,nParts) = partBox(:,:,iPart) 3187 partBox(1,ix,nParts) = iDiv ! New part is upper one 3188 partBox(2,ix,iPart) = iDiv - 1 ! Old part reduced to lower one 3189 cycle divideLoop 3190 endif 3191 end do ! ix 3192 end do ! iPart 3193 exit divideLoop ! since no part can be further divided 3194 end do divideLoop 3195 3196 ! Find equivalent boxes in first unit cell 3197 forall(ix=1:3) & 3198 partBox0(:,ix,1:nParts) = modulo( partBox(:,ix,1:nParts), nMesh(ix) ) 3199 3200end subroutine unitCellParts 3201 3202!------------------------------------------------------------------------------- 3203 3204END MODULE mesh3D 3205 3206