xLinAlg/xInterfaceMumps/src/xLinearSystemSolverMumps_imp.h · 9ca299e06cceecd2c0c0dfe53c5513407ce62aac · eXlibris / Xfiles

[xLinAlg] add reducing mechanism to mumps distributed interface · 9ca299e0

Alexis SALZMAN authored Jan 19, 2021

When dealing with large number of cores with mumps distributed interface,
if the size of the problem is small, poor performance are obtained with
mumps library. The fact that matrix is presented in distributed format
alleviate some internal optimization (available with centralize matrix
format) and mumps among other has difficulty then to estimate its memory
needs. This lead to underestimate memory evaluation during analyse phase
which stop computation during factorization with a -9 error. Even if
this memory issue is bypassed CPU time increase with number of cores
when more then needed are used.

In this commit a extra parameter ratio_reduce_comm_ is added to
connectMatrix method of xLinearSystemSolverMumpsDistributed class.
It correspond some how to an "ideal" ratio between the number of core to
use for a given problem size "n". mx=ratio_reduce_comm * n is giving a
roughs estimate of the maximum number of cores needed by mumps
computation for a "n" size linear system. If the communicator given to
mumps interface is larger then this mx estimate, only mx cores will be
used with mumps. In this case interface will allocate its own memory to
store matrix terms and user will have to use reduceMatrices() if he
updates value in matrix storage connected to the interface. Because in
this case interface groups matrix terms in the reduced set of core
participating to mumps computation. Thus some communication need to be
done every times terms changes in connected matrix. This alleviate some
how the nice unique memory space shared between the connect matrix and
mumps.

When the communicator given to mumps interface is smaller then this mx
estimate the interface behaves has before. Same if ratio_reduce_comm_ is
null.

This solution represent an intermediate between centralized and full
distributed matrix format for mumps. If ratio lead to mx=1 a even more
clearer implementation would be to switch to centralized matrix format
for mumps. TODO.

For now memory consumption is not optimized as regrouped terms on process
that hold matrix may be duplicate (i.e. a term i,j may appearers many
time due to its presence in many process). Its not a problem for mumps
as it will sum them but it cost memory. Reducing this consumption is not
so easy to do. Communication buffer have to transfers all those
terms so their size may be important anyway. TODO.

9ca299e0