• Alexis SALZMAN's avatar
    [xLinAlg] add reducing mechanism to mumps distributed interface · 9ca299e0
    Alexis SALZMAN authored
    When dealing with large number of cores with mumps distributed interface,
    if the size of the problem is small, poor performance are obtained with
    mumps library. The fact that matrix is presented in distributed format
    alleviate some internal optimization (available with centralize matrix
    format) and mumps among other has difficulty then to estimate its memory
    needs. This lead to underestimate memory evaluation during analyse phase
    which stop computation during factorization with a -9 error. Even if
    this memory issue is bypassed CPU time increase with number of cores
    when more then needed are used.
    
    In this commit a extra parameter ratio_reduce_comm_ is added to
    connectMatrix method of xLinearSystemSolverMumpsDistributed class.
    It correspond some how to an "ideal" ratio between the number of core to
    use for a given problem size "n". mx=ratio_reduce_comm * n is giving a
    roughs estimate of the maximum number of cores needed by mumps
    computation for a "n" size linear system. If the communicator given to
    mumps interface is larger then this mx estimate, only mx cores will be
    used with mumps. In this case interface will allocate its own memory to
    store matrix terms and user will have to use reduceMatrices() if he
    updates value in matrix storage connected to the interface. Because in
    this case interface groups matrix terms in the reduced set of core
    participating to mumps computation. Thus some communication need to be
    done every times terms changes in connected matrix. This alleviate some
    how the nice unique memory space shared between the connect matrix and
    mumps.
    
    When the communicator given to mumps interface is smaller then this mx
    estimate the interface behaves has before. Same if ratio_reduce_comm_ is
    null.
    
    This solution represent an intermediate between centralized and full
    distributed matrix format for mumps. If ratio lead to mx=1 a even more
    clearer implementation would be to switch to centralized  matrix format
    for mumps. TODO.
    
    For now memory consumption is not optimized as regrouped terms on process
    that hold matrix may be duplicate (i.e. a term i,j may appearers many
    time due to its presence in many process). Its not a problem for mumps
    as it will sum them but it cost memory. Reducing this consumption is not
    so easy to do. Communication buffer have to transfers all those
    terms so their size may be important anyway. TODO.
    9ca299e0