Commit 0628714a authored by Alexis SALZMAN's avatar Alexis SALZMAN

[xTool] add xDeltaMemory a basic memory profiler

xDeltaMemory class is built in the same spirit as xDeltaTime.
A "start" method capture a state and a "end" method do a comparison with
previous captured state. This comparison give a measure of memory
consumption between those two events.
To be as precise as possible the usual string used to flag measure are
now forcedly used by initAccu method and user must use startAccu,
endAccu with the integer returned by initAccu.

It give result like xDeltaTime as table with parallel statistic and Peak
memory usage. This last information comes from getrusage (like
xMemoryMonitor) and correspond to maximum resident set size. All other
information are related to user request and do not represent the
application true current memory consumption: malloc may have already
reserve 2GB and user vector may use only 1.5GB. This is this last
information that is given by xDeltaMemory.

Only Heap is analyzed.

Behind the curtain:
===================
xDeltaMemory used deprecated GNU extension called hooks. malloc,realloc
and free hooks give a way to set your own feature related to those
function. In our case we want to count what user did ask for allocation.
A ideal solution would be to use the size argument to count. But when
freeing memory we just have a pointer in hand so its not easy to decrease
our counter. Instead xDeltaMemory use malloc_usable_size which demangle the
malloc information related to the accessible memory associated to
pointer given by malloc.  xDeltaMemory over estimate then the real size
asked by the user as  malloc_usable_size return the asked size plus the
padding eventually added by malloc. But this is already a pretty
accurate measure. Compare to xMemoryMonitor or the use of mallinfo here
we place the measure in client application side. The others approaches
can be qualified as system measuring snapshot. And thus they are harder
to use for precise measure of user allocation.

The deprecated aspect seem to last from a long time ... With gcc 8 on
liger it is still available ....

The GNU extension aspect is clearly a limitation. This is not portable !
But all GNU specific aspect has been guarded by __GNUC__ macro so on
other compiler xDeltaMemory will return 0 for all measure.

To measure third party library calls, it will be possible only if
the library is also compiled with a GNU compiler and malloc is in use.
Somme test on liger with mumps show that xDeltaMemory provided almost
the same information as MUMPS itself.

Note that interleaving measure is possible. A specific counter track the
start/end to deactivate hooks outside measuring sequence to avoid any
extra computational cost.

The implementation use static variable ! It is then not currently thread safe.

New atomic test case:
=====================
A small test show how  to use xDeltaMemory and is for now the only
documentation.

TODO
====
doc+xNoDeltaMemory class
parent ce691cef
This diff is collapsed.
/*
This file is a part of eXlibris C++ Library
under the GNU Lesser General Public License.
See the NOTICE.md & LICENSE.md files for terms
and conditions.
*/
#ifndef XDELTAMEMORY_H
#define XDELTAMEMORY_H
#include <map>
#include <string>
#include <vector>
#include "mpi.h"
#ifdef __GNUC__
extern "C"
{
#include <malloc.h>
}
#endif
namespace xtool
{
class xDeltaMemory
{
public:
xDeltaMemory(MPI_Comm world_ = MPI_COMM_WORLD);
~xDeltaMemory();
int initAccu(std::string stage, bool local = false);
inline void startAccu(int id)
{
switch_hook_on();
set(m[id]);
}
inline void endAccu(int id)
{
set();
dm[id] += (m_cur - m[id]);
switch_hook_off();
}
void print();
double get(int id);
double get(int id, double &mi, double &mx, double &me, double &su);
private:
MPI_Comm world;
int n, nb_proc, proc_id;
std::map<std::string, int> strtoind;
std::vector<long long int> dm;
std::vector<long long int> m;
long long int zero;
long long int m_cur;
void set();
void set(long long int &m);
void reduce(double &aloc, double *min, double *max, double *mean, double *su) const;
void setOldPointer(void);
void switch_hook_on(void);
void switch_hook_off(void);
};
class xNoDeltaMemory
{
public:
xNoDeltaMemory(MPI_Comm world_ = MPI_COMM_WORLD) {}
~xNoDeltaMemory() = default;
};
} // namespace xtool
#endif
...@@ -18,6 +18,7 @@ set(LIST ...@@ -18,6 +18,7 @@ set(LIST
${CMAKE_CURRENT_SOURCE_DIR}/testSendOnlyKeysTraits ${CMAKE_CURRENT_SOURCE_DIR}/testSendOnlyKeysTraits
${CMAKE_CURRENT_SOURCE_DIR}/xExportStringDist ${CMAKE_CURRENT_SOURCE_DIR}/xExportStringDist
${CMAKE_CURRENT_SOURCE_DIR}/xDeltaTime ${CMAKE_CURRENT_SOURCE_DIR}/xDeltaTime
${CMAKE_CURRENT_SOURCE_DIR}/xDeltaMemory
) )
create_tests_from_list(${LIST}) create_tests_from_list(${LIST})
......
enable_testing()
add_test(
NAME xDeltaMemory
COMMAND ${MPIEXEC} ${MPIEXEC_NUMPROC_FLAG} 3 ${MPIEXEC_PREFLAGS} ${INSTALL_PATH}/${TARGET_NAME} ${MPIEXEC_POSTFLAGS}
WORKING_DIRECTORY ${INSTALL_PATH}
)
add_test(
NAME ndiff_xDeltaMemory
COMMAND ${TESTNDIFF}
WORKING_DIRECTORY ${INSTALL_PATH}
)
set_tests_properties(ndiff_xDeltaMemory PROPERTIES DEPENDS xDeltaMemory )
/*
This file is a part of eXlibris C++ Library
under the GNU Lesser General Public License.
See the NOTICE.md & LICENSE.md files for terms
and conditions.
*/
#include <cstring>
#include <fstream>
#include <iomanip>
#include <iostream>
#include <sstream>
#include <unordered_map>
#include "xDeltaMemory.h"
#include "xDeltaTime.h"
using namespace std;
#include "xMemoryMonitor.h"
#define NBL 10
void foo(xtool::xDeltaTime &dt, xtool::xDeltaMemory &dm, xMemoryMonitor &mm, std::ofstream &out_ref, int proc_id)
{
int iddti0 = dt.initAccu("in loop cost DM");
int iddti1 = dt.initAccu("out of loop cost MM");
int iddmi0 = dm.initAccu("sum rand alloc");
int iddmi1 = dm.initAccu("dealloc");
int iddmi2 = dm.initAccu("a small chunck in foo");
int iddmi3 = dm.initAccu("alloc+dealloc");
dm.startAccu(iddmi3);
size_t k = 0;
double *pointers[NBL];
std::cout << "========================================" << std::endl;
std::cout << "Before alloc loop" << std::endl;
std::cout << "========================================" << std::endl;
dt.startAccu(iddti1);
int idmmi0 = mm.start("sum rand alloc");
dt.endAccu(iddti1);
for (size_t i = 0; i < NBL; ++i)
{
dt.startAccu(iddti0);
dm.startAccu(iddmi0);
dt.endAccu(iddti0);
size_t s = rand() % (5000000 * (i + 1)) + 1;
k += s;
cout << "Allocate " << i << "th chunck (B): " << s * 8 << endl;
pointers[i] = new double[s];
std::fill(pointers[i], pointers[i] + s, 3.);
std::cout << "========================================" << std::endl;
dt.startAccu(iddti0);
dm.endAccu(iddmi0);
dt.endAccu(iddti0);
}
dt.startAccu(iddti1);
mm.end(idmmi0);
dt.endAccu(iddti1);
std::cout << "After alloc loop" << std::endl;
std::cout << "========================================" << std::endl;
std::cout << "Pick allocation (B): " << k * 8 << std::endl;
std::cout << "========================================" << std::endl;
std::cout << "Memory leak introduced by not freeing last allocated block" << std::endl;
std::cout << "========================================" << std::endl;
dt.startAccu(iddti1);
idmmi0 = mm.start("dealloc");
dt.endAccu(iddti1);
for (size_t i = 0; i < NBL - 1; ++i)
{
dt.startAccu(iddti0);
dm.startAccu(iddmi1);
dt.endAccu(iddti0);
cout << "Deallocate " << i << "th chunck" << endl;
delete[] pointers[i];
std::cout << "========================================" << std::endl;
dt.startAccu(iddti0);
dm.endAccu(iddmi1);
dt.endAccu(iddti0);
}
dm.endAccu(iddmi3);
dt.startAccu(iddti1);
mm.end(idmmi0);
dt.endAccu(iddti1);
std::cout << "========================================" << std::endl;
std::cout << "A small chunck (32B) in foo (leak)" << std::endl;
idmmi0 = mm.start("a small chunck in foo");
dm.startAccu(iddmi2);
double *i = new double[4];
std::fill(i, i + 4, 4.);
dm.endAccu(iddmi2);
mm.end(idmmi0);
std::cout << "========================================" << std::endl;
double mx, mi, me, su;
double val = dm.get(iddmi0, mi, mx, me, su);
if (proc_id)
out_ref << "Retriving data for 'sum rand alloc' (GB) " << val / 1073741824. << std::endl;
else
out_ref << "Retriving data for 'sum rand alloc' (GB) " << val / 1073741824. << " min/max/mean/sum " << mi / 1073741824.
<< " " << mx / 1073741824. << " " << me / 1073741824. << " " << su / 1073741824. << std::endl;
out_ref << "========================================" << std::endl;
val = dm.get(iddmi1, mi, mx, me, su);
if (proc_id)
out_ref << "Retriving data for 'dealloc' (GB) " << val / 1073741824. << std::endl;
else
out_ref << "Retriving data for 'dealloc' (GB) " << val / 1073741824. << " min/max/mean/sum " << mi / 1073741824. << " "
<< mx / 1073741824. << " " << me / 1073741824. << " " << su / 1073741824. << std::endl;
out_ref << "========================================" << std::endl;
val = dm.get(iddmi2, mi, mx, me, su);
if (proc_id)
out_ref << "Retriving data for 'a small chunck in foo' (B) " << val << std::endl;
else
out_ref << "Retriving data for 'a small chunck in foo' (B) " << val << " min/max/mean/sum " << mi << " " << mx << " " << me
<< " " << su << std::endl;
out_ref << "========================================" << std::endl;
val = dm.get(iddmi3, mi, mx, me, su);
if (proc_id)
out_ref << "Retriving data for 'alloc+dealloc' (MB) " << val / 1048576. << std::endl;
else
out_ref << "Retriving data for 'alloc+dealloc' (MB) " << val / 1048576. << " min/max/mean/sum " << mi / 1048576. << " "
<< mx / 1048576. << " " << me / 1048576. << " " << su / 1048576. << std::endl;
out_ref << "========================================" << std::endl;
return;
}
int main(int argc, char *argv[])
{
MPI_Init(&argc, &argv);
int proc_id;
MPI_Comm_rank(MPI_COMM_WORLD, &proc_id);
srand((proc_id + 1) * 37);
string no = "proc_" + std::to_string(proc_id) + "_output.txt";
freopen(no.c_str(), "w", stdout);
std::ofstream out_ref;
string noo = "reference_" + std::to_string(proc_id) + ".txt";
out_ref.open(noo.c_str());
out_ref << fixed << std::setprecision(2);
std::cout << "==Start ===============================" << std::endl;
xtool::xDeltaTime dt;
xtool::xDeltaMemory dm;
xMemoryMonitor mm;
int iddm = dm.initAccu("foo");
int iddmm = mm.start("foo");
dm.startAccu(iddm);
foo(dt, dm, mm, out_ref, proc_id);
dm.endAccu(iddm);
mm.end(iddmm);
std::cout << "========================================" << std::endl;
double mx, mi, me, su;
double val = dm.get(iddm, mi, mx, me, su);
const int small_chunck = 10;
if (!proc_id)
{
/* During instalation some small variations have been observed depending on the
* way test are launched. Not clear to me why ?! To avoid prb
* remove from reference
out_ref << "Retriving data for foo (MB) " << val / 1048576. << " min/max/mean/sum " << mi / 1048576. << " " << mx / 1048576.
<< " " << me / 1048576. << " " << su / 1048576. << std::endl;
*/
std::cout << "========================================" << std::endl;
iddm = dm.initAccu("a small chunck i", true);
std::cout << "========================================" << std::endl;
std::cout << "A small chunck i (" << small_chunck * 8 << "B) in main P0 (leak)" << std::endl;
dm.startAccu(iddm);
double *i = new double[small_chunck];
std::fill(i, i + small_chunck, 7.2);
dm.endAccu(iddm);
out_ref << "========================================" << std::endl;
out_ref << "Retriving data for small chunck i " << dm.get(iddm) << "B" << std::endl;
std::cout << "========================================" << std::endl;
std::cout << "Another small chunck k (" << small_chunck * 80 << "B) in main P0 (leak)" << std::endl;
iddmm = mm.start("another small chunck k");
double *k = new double[small_chunck * 10];
std::fill(k, k + small_chunck * 10, 4.2);
mm.end(iddmm);
}
if (proc_id)
{
out_ref << "Retriving data for foo (MB) " << val / 1048576. << std::endl;
std::cout << "========================================" << std::endl;
std::cout << "A small chunck j (" << small_chunck * 2 << "B) in main Px (leak)" << std::endl;
iddm = dm.initAccu("A small chunck j", true);
dm.startAccu(iddm);
double *j = new double[small_chunck * 2];
j[small_chunck - 1] = 3.2;
dm.endAccu(iddm);
out_ref << "========================================" << std::endl;
out_ref << "Retriving data for small chunck j " << dm.get(iddm) << "B" << std::endl;
std::cout << "========================================" << std::endl;
std::cout << "Another small chunck k (" << small_chunck * 8000 << "B) in main Px (leak)" << std::endl;
iddm = dm.initAccu("Another small chunck k", true);
iddmm = mm.start("Another small chunck k");
dm.startAccu(iddm);
double *k = new double[small_chunck * 1000];
std::fill(k, k + small_chunck * 1000, 4.2);
dm.endAccu(iddm);
mm.end(iddmm);
std::cout << "========================================" << std::endl;
out_ref << "========================================" << std::endl;
out_ref << "Retriving data for small chunck k " << dm.get(iddm) / 1024. << "KB" << std::endl;
out_ref << "========================================" << std::endl;
}
MPI_Barrier(MPI_COMM_WORLD);
dm.print();
#ifdef PROFILE
dt.print();
#endif
mm.print(cout);
MPI_Finalize();
return 0;
}
Retriving data for 'sum rand alloc' (GB) 1.41 min/max/mean/sum 0.84 1.41 1.18 3.53
========================================
Retriving data for 'dealloc' (GB) -1.15 min/max/mean/sum -1.15 -0.73 -0.98 -2.95
========================================
Retriving data for 'a small chunck in foo' (B) 40.00 min/max/mean/sum 40.00 40.00 40.00 120.00
========================================
Retriving data for 'alloc+dealloc' (MB) 261.82 min/max/mean/sum 115.20 261.82 199.91 599.72
========================================
========================================
Retriving data for small chunck i 88.00B
Retriving data for 'sum rand alloc' (GB) 0.84
========================================
Retriving data for 'dealloc' (GB) -0.73
========================================
Retriving data for 'a small chunck in foo' (B) 40.00
========================================
Retriving data for 'alloc+dealloc' (MB) 115.20
========================================
Retriving data for foo (MB) 115.20
========================================
Retriving data for small chunck j 168.00B
========================================
Retriving data for small chunck k 78.13KB
========================================
Retriving data for 'sum rand alloc' (GB) 1.28
========================================
Retriving data for 'dealloc' (GB) -1.07
========================================
Retriving data for 'a small chunck in foo' (B) 40.00
========================================
Retriving data for 'alloc+dealloc' (MB) 222.70
========================================
Retriving data for foo (MB) 222.70
========================================
Retriving data for small chunck j 168.00B
========================================
Retriving data for small chunck k 78.13KB
========================================
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment