[xLinAlg] add reducing mechanism to mumps distributed interface
When dealing with large number of cores with mumps distributed interface, if the size of the problem is small, poor performance are obtained with mumps library. The fact that matrix is presented in distributed format alleviate some internal optimization (available with centralize matrix format) and mumps among other has difficulty then to estimate its memory needs. This lead to underestimate memory evaluation during analyse phase which stop computation during factorization with a -9 error. Even if this memory issue is bypassed CPU time increase with number of cores when more then needed are used. In this commit a extra parameter ratio_reduce_comm_ is added to connectMatrix method of xLinearSystemSolverMumpsDistributed class. It correspond some how to an "ideal" ratio between the number of core to use for a given problem size "n". mx=ratio_reduce_comm * n is giving a roughs estimate of the maximum number of cores needed by mumps computation for a "n" size linear system. If the communicator given to mumps interface is larger then this mx estimate, only mx cores will be used with mumps. In this case interface will allocate its own memory to store matrix terms and user will have to use reduceMatrices() if he updates value in matrix storage connected to the interface. Because in this case interface groups matrix terms in the reduced set of core participating to mumps computation. Thus some communication need to be done every times terms changes in connected matrix. This alleviate some how the nice unique memory space shared between the connect matrix and mumps. When the communicator given to mumps interface is smaller then this mx estimate the interface behaves has before. Same if ratio_reduce_comm_ is null. This solution represent an intermediate between centralized and full distributed matrix format for mumps. If ratio lead to mx=1 a even more clearer implementation would be to switch to centralized matrix format for mumps. TODO. For now memory consumption is not optimized as regrouped terms on process that hold matrix may be duplicate (i.e. a term i,j may appearers many time due to its presence in many process). Its not a problem for mumps as it will sum them but it cost memory. Reducing this consumption is not so easy to do. Communication buffer have to transfers all those terms so their size may be important anyway. TODO.
Showing with 244 additions and 40 deletions
This diff is collapsed.