Commit 0628714a authored by Alexis SALZMAN's avatar Alexis SALZMAN

[xTool] add xDeltaMemory a basic memory profiler

xDeltaMemory class is built in the same spirit as xDeltaTime.
A "start" method capture a state and a "end" method do a comparison with
previous captured state. This comparison give a measure of memory
consumption between those two events.
To be as precise as possible the usual string used to flag measure are
now forcedly used by initAccu method and user must use startAccu,
endAccu with the integer returned by initAccu.

It give result like xDeltaTime as table with parallel statistic and Peak
memory usage. This last information comes from getrusage (like
xMemoryMonitor) and correspond to maximum resident set size. All other
information are related to user request and do not represent the
application true current memory consumption: malloc may have already
reserve 2GB and user vector may use only 1.5GB. This is this last
information that is given by xDeltaMemory.

Only Heap is analyzed.

Behind the curtain:
===================
xDeltaMemory used deprecated GNU extension called hooks. malloc,realloc
and free hooks give a way to set your own feature related to those
function. In our case we want to count what user did ask for allocation.
A ideal solution would be to use the size argument to count. But when
freeing memory we just have a pointer in hand so its not easy to decrease
our counter. Instead xDeltaMemory use malloc_usable_size which demangle the
malloc information related to the accessible memory associated to
pointer given by malloc.  xDeltaMemory over estimate then the real size
asked by the user as  malloc_usable_size return the asked size plus the
padding eventually added by malloc. But this is already a pretty
accurate measure. Compare to xMemoryMonitor or the use of mallinfo here
we place the measure in client application side. The others approaches
can be qualified as system measuring snapshot. And thus they are harder
to use for precise measure of user allocation.

The deprecated aspect seem to last from a long time ... With gcc 8 on
liger it is still available ....

The GNU extension aspect is clearly a limitation. This is not portable !
But all GNU specific aspect has been guarded by __GNUC__ macro so on
other compiler xDeltaMemory will return 0 for all measure.

To measure third party library calls, it will be possible only if
the library is also compiled with a GNU compiler and malloc is in use.
Somme test on liger with mumps show that xDeltaMemory provided almost
the same information as MUMPS itself.

Note that interleaving measure is possible. A specific counter track the
start/end to deactivate hooks outside measuring sequence to avoid any
extra computational cost.

The implementation use static variable ! It is then not currently thread safe.

New atomic test case:
=====================
A small test show how  to use xDeltaMemory and is for now the only
documentation.

TODO
====
doc+xNoDeltaMemory class
parent ce691cef
/*
This file is a part of eXlibris C++ Library
under the GNU Lesser General Public License.
See the NOTICE.md & LICENSE.md files for terms
and conditions.
*/
// ----------------------------------------------------------------------------
// HEADERS
// ----------------------------------------------------------------------------
#include "xDeltaMemory.h"
#include <sys/resource.h>
#include <sys/time.h>
#include <cassert>
#include <iomanip>
#include <iostream>
#include <map>
#include <sstream>
using namespace std;
// =================================================================
// malloc,calloc,free hooks
//
// Declaration part
static void *(*old_malloc_hook)(size_t, const void *);
static void *(*old_realloc_hook)(void *, size_t, const void *);
static void (*old_free_hook)(void *, const void *);
static long long int sum = 0;
static size_t hook_counter = 0;
static void *xDeltaMemory_malloc_hook(size_t size, const void *caller);
static void *xDeltaMemory_realloc_hook(void *ptr, size_t size, const void *caller);
static void xDeltaMemory_free_hook(void *ptr, const void *caller);
static void switch_hooks_on()
{
#ifdef __GNUC__
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
__malloc_hook = xDeltaMemory_malloc_hook;
__realloc_hook = xDeltaMemory_realloc_hook;
__free_hook = xDeltaMemory_free_hook;
#pragma GCC diagnostic pop
#endif
}
static void switch_hooks_off()
{
#ifdef __GNUC__
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
__malloc_hook = old_malloc_hook;
__realloc_hook = old_realloc_hook;
__free_hook = old_free_hook;
#pragma GCC diagnostic pop
#endif
}
// Implementation part
static void *xDeltaMemory_malloc_hook(size_t size, const void *caller)
{
void *result = nullptr;
// switch from specific hook to std hook
switch_hooks_off();
// std call
result = malloc(size);
// cout<<"mh "<<sum<<" "<<malloc_usable_size(result)<<endl;
// counter update (take into account padding but not extra bytes assoiatted to block.
// Thus it is a lower estimate of what is consumed but a upper estimate of what is
// asked)
sum += malloc_usable_size(result);
// not clear why those 3 lines are required !
// commented for now
// old_malloc_hook = __malloc_hook;
// old_realloc_hook = __realloc_hook;
// old_free_hook = __free_hook;
// switch back to specific hook
switch_hooks_on();
return result;
}
static void *xDeltaMemory_realloc_hook(void *ptr, size_t size, const void *caller)
{
void *result = nullptr;
// switch from specific hook to std hook
switch_hooks_off();
// counter update before reallocation: remove already allocated stuff to avoid counting it
// twice. If ptr is null realloc is malloc and nothing have to be remove to counting
size_t delta = 0;
if (ptr) delta = malloc_usable_size(ptr);
// std call
result = realloc(ptr, size);
// cout<<"rh "<<sum<<" "<<delta<<" "<<malloc_usable_size(result)<<endl;
// counter update: new size counted for non null size
if (size) sum = sum - delta + malloc_usable_size(result);
// counter update: freed size counted for null size
else
sum = sum - delta;
// not clear why those 3 lines are required !
// commented for now
// old_malloc_hook = __malloc_hook;
// old_realloc_hook = __realloc_hook;
// old_free_hook = __free_hook;
// switch back to specific hook
switch_hooks_on();
return result;
}
static void xDeltaMemory_free_hook(void *ptr, const void *caller)
{
// switch from specific hook to std hook
switch_hooks_off();
// cout<<"dh "<<sum<<" "<<malloc_usable_size(ptr)<<endl;
// counter update (see remark in malloc hook)
sum -= malloc_usable_size(ptr);
// std call
free(ptr);
// not clear why those 3 lines are required !
// commented for now
// old_malloc_hook = __malloc_hook;
// old_realloc_hook = __realloc_hook;
// old_free_hook = __free_hook;
// switch back to specific hook
switch_hooks_on();
}
// ===============================================================================================
namespace xtool
{
std::pair<double, std::string> scaleUnit(double val)
{
const double B = 1.;
const double KB = 1024. * B;
const double MB = 1024. * KB;
const double GB = 1024. * MB;
const double TB = 1024. * GB;
if (val < 0) val = -val;
if (val > TB)
{
// TB
return std::make_pair(1. / TB, "TB");
}
else if (val > GB)
{
// GB
return std::make_pair(1. / GB, "GB");
}
else if (val > MB)
{
// MB
return std::make_pair(1. / MB, "MB");
}
else if (val > KB)
{
// KB
return std::make_pair(1. / KB, "KB");
}
else
{
// B
return std::make_pair(1., "B ");
}
throw -1234;
}
xDeltaMemory::xDeltaMemory(MPI_Comm world_) : world(world_), n(0), nb_proc(1), proc_id(0)
{
MPI_Comm_size(world, &nb_proc);
MPI_Comm_rank(world, &proc_id);
// init zero
zero = 0;
// set hooks saving pointers (old)
setOldPointer();
}
xDeltaMemory::~xDeltaMemory() { switch_hooks_off(); }
void xDeltaMemory::set() { m_cur = sum; }
void xDeltaMemory::set(long long int &m) { m = sum; }
int xDeltaMemory::initAccu(std::string stage, bool local)
{
switch_hooks_off(); // remove xDeltaMemory from counts
int i = dm.size();
if (local)
strtoind[stage] = -i;
else
strtoind[stage] = i;
dm.push_back(zero);
m.push_back(zero);
switch_hooks_on(); // counts again
return i;
}
void xDeltaMemory::print()
{
switch_hook_off(); // remove xDeltaMemory from counts if not in a middle of a count otherwise thing will be counted
struct rusage rusage;
getrusage(RUSAGE_SELF, &rusage);
double salloc = rusage.ru_maxrss * 1024.;
assert(salloc != 0.);
double mi, mx, me, su;
reduce(salloc, &mi, &mx, &me, &su);
string Heapu("Heap asked");
string Peack("Peack memory usage up to this point (");
auto scale = scaleUnit(salloc);
auto scale2 = scale;
cout << " " << endl;
cout
<< "==xDeltaMemory output ================================================================================================"
<< fixed << setprecision(5) << endl;
cout << Peack + scale.second << ") : " << salloc * scale.first;
if (!proc_id) cout << " min/max/average : " << mi * scale.first << "/" << mx * scale.first << "/" << me * scale.first;
cout
<< endl
<< "======================================================================================================================"
<< endl;
bool do_more_print = false;
int j;
std::map<std::string, int>::iterator it = strtoind.begin();
std::map<std::string, int>::iterator itend = strtoind.end();
if (it != itend) do_more_print = true;
if (do_more_print)
{
bool got_local = false;
if (proc_id)
{
cout << setfill('=') << setw(82) << " " << endl;
cout << setfill(' ') << "| " << setw(32) << Heapu << " | " << setw(42) << "for"
<< " |" << endl;
cout << setfill('=') << setw(82) << " " << endl;
cout << setfill(' ') << "| " << setw(2) << "U."
<< " | " << setw(12) << "val"
<< " | " << setw(12) << "% total"
<< " | " << setw(44) << " |" << endl;
for (; it != itend; ++it)
{
j = it->second;
if (j > -1)
{
double uallocj = dm[j];
reduce(uallocj, &mi, &mx, &me, &su);
scale = scaleUnit(uallocj);
cout << "| " << setw(2) << scale.second << " | " << setprecision(2) << setw(12) << uallocj * scale.first << " | "
<< setprecision(1) << setw(12) << 100 * uallocj / salloc << " | " << setw(42) << it->first << " |" << endl;
}
else
got_local = true;
}
cout << setfill('=') << setw(82) << " " << setfill(' ') << endl;
}
else
{
cout << setfill('=') << setw(130) << " " << endl;
cout << setfill(' ') << "| " << setw(80) << Heapu << " | " << setw(42) << "for"
<< " |" << endl;
cout << setfill('=') << setw(130) << " " << endl;
cout << setfill(' ') << "| " << setw(9) << "val sum"
<< " | " << setw(2) << "U."
<< " | " << setw(8) << "val min"
<< " | " << setw(8) << "val max"
<< " | " << setw(8) << "val"
<< " | " << setw(8) << "% total"
<< " | " << setw(8) << "val avg"
<< " | " << setw(8) << "% t. avg"
<< " | " << setw(44) << " |" << endl;
for (; it != itend; ++it)
{
j = it->second;
if (j > -1)
{
double uallocj = dm[j];
reduce(uallocj, &mi, &mx, &me, &su);
scale = scaleUnit(me);
scale2 = scaleUnit(su);
cout << "| " << fixed << setprecision(2) << setw(8) << su * scale2.first << setw(2) << scale2.second << "| "
<< setw(2) << scale.second << " | " << setw(8) << mi * scale.first << " | " << setw(8) << mx * scale.first
<< " | " << setw(8) << uallocj * scale.first << " | " << setprecision(1) << setw(8) << 100 * uallocj / salloc
<< " | " << setprecision(2) << setw(8) << me * scale.first << " | " << setprecision(1) << setw(8)
<< 100 * me / salloc;
cout << " | " << setw(42) << it->first << " |" << endl;
}
else
got_local = true;
}
cout << setfill('=') << setw(130) << " " << setfill(' ') << endl;
}
if (got_local)
{
cout << setfill(' ') << "| " << setw(32) << "local"
<< " |" << endl;
cout << setfill('=') << setw(82) << " " << endl;
cout << setfill(' ') << "| " << setw(32) << Heapu << " | " << setw(42) << "for"
<< " |" << endl;
cout << setfill('=') << setw(82) << " " << endl;
cout << setfill(' ') << "| " << setw(2) << "U."
<< " | " << setw(12) << "val"
<< " | " << setw(12) << "% total"
<< " | " << setw(44) << " |" << endl;
for (it = strtoind.begin(); it != itend; ++it)
{
j = it->second;
if (j < 0)
{
j = -j;
double uallocj = dm[j];
scale = scaleUnit(uallocj);
cout << "| " << setw(2) << scale.second << " | " << setprecision(2) << setw(12) << uallocj * scale.first << " | "
<< setprecision(1) << setw(12) << 100 * uallocj / salloc << " | " << setw(42) << it->first << " |" << endl;
}
}
cout << setfill('=') << setw(82) << " " << setfill(' ') << endl;
}
}
cout
<< "==End xDeltaMemory output ============================================================================================"
<< endl;
switch_hook_on(); // counts again
}
double xDeltaMemory::get(int id)
{
if (id < 0) id = -id;
return dm[id];
}
double xDeltaMemory::get(int id, double &mi, double &mx, double &me, double &su)
{
if (id < 0) id = -id;
double val = dm[id];
reduce(val, &mi, &mx, &me, &su);
return val;
}
void xDeltaMemory::reduce(double &val, double *min, double *max, double *mean, double *su) const
{
double *data = &val;
MPI_Reduce(data, max, 1, MPI_DOUBLE, MPI_MAX, 0, world);
MPI_Reduce(data, min, 1, MPI_DOUBLE, MPI_MIN, 0, world);
MPI_Reduce(data, su, 1, MPI_DOUBLE, MPI_SUM, 0, world);
*mean = (*su) / nb_proc;
return;
}
void xDeltaMemory::setOldPointer()
{
if (hook_counter)
{
cout << "hook_counter when invoking setOldPointer should be null " << endl;
cout << "Only one xDeltaMemory possible" << endl;
throw -78;
}
#ifdef __GNUC__
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
old_malloc_hook = __malloc_hook;
old_realloc_hook = __realloc_hook;
old_free_hook = __free_hook;
#pragma GCC diagnostic pop
#endif
}
void xDeltaMemory::switch_hook_on()
{
#ifdef __GNUC__
if (++hook_counter < 2)
{
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
__malloc_hook = xDeltaMemory_malloc_hook;
__realloc_hook = xDeltaMemory_realloc_hook;
__free_hook = xDeltaMemory_free_hook;
#pragma GCC diagnostic pop
sum = 0;
}
#endif
}
void xDeltaMemory::switch_hook_off()
{
#ifdef __GNUC__
if (--hook_counter < 1)
{
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
__malloc_hook = old_malloc_hook;
__realloc_hook = old_realloc_hook;
__free_hook = old_free_hook;
#pragma GCC diagnostic pop
sum = 0;
}
#endif
}
} // namespace xtool
/*
This file is a part of eXlibris C++ Library
under the GNU Lesser General Public License.
See the NOTICE.md & LICENSE.md files for terms
and conditions.
*/
#ifndef XDELTAMEMORY_H
#define XDELTAMEMORY_H
#include <map>
#include <string>
#include <vector>
#include "mpi.h"
#ifdef __GNUC__
extern "C"
{
#include <malloc.h>
}
#endif
namespace xtool
{
class xDeltaMemory
{
public:
xDeltaMemory(MPI_Comm world_ = MPI_COMM_WORLD);
~xDeltaMemory();
int initAccu(std::string stage, bool local = false);
inline void startAccu(int id)
{
switch_hook_on();
set(m[id]);
}
inline void endAccu(int id)
{
set();
dm[id] += (m_cur - m[id]);
switch_hook_off();
}
void print();
double get(int id);
double get(int id, double &mi, double &mx, double &me, double &su);
private:
MPI_Comm world;
int n, nb_proc, proc_id;
std::map<std::string, int> strtoind;
std::vector<long long int> dm;
std::vector<long long int> m;
long long int zero;
long long int m_cur;
void set();
void set(long long int &m);
void reduce(double &aloc, double *min, double *max, double *mean, double *su) const;
void setOldPointer(void);
void switch_hook_on(void);
void switch_hook_off(void);
};
class xNoDeltaMemory
{
public:
xNoDeltaMemory(MPI_Comm world_ = MPI_COMM_WORLD) {}
~xNoDeltaMemory() = default;
};
} // namespace xtool
#endif
......@@ -18,6 +18,7 @@ set(LIST
${CMAKE_CURRENT_SOURCE_DIR}/testSendOnlyKeysTraits
${CMAKE_CURRENT_SOURCE_DIR}/xExportStringDist
${CMAKE_CURRENT_SOURCE_DIR}/xDeltaTime
${CMAKE_CURRENT_SOURCE_DIR}/xDeltaMemory
)
create_tests_from_list(${LIST})
......
enable_testing()
add_test(
NAME xDeltaMemory
COMMAND ${MPIEXEC} ${MPIEXEC_NUMPROC_FLAG} 3 ${MPIEXEC_PREFLAGS} ${INSTALL_PATH}/${TARGET_NAME} ${MPIEXEC_POSTFLAGS}
WORKING_DIRECTORY ${INSTALL_PATH}
)
add_test(
NAME ndiff_xDeltaMemory
COMMAND ${TESTNDIFF}
WORKING_DIRECTORY ${INSTALL_PATH}
)
set_tests_properties(ndiff_xDeltaMemory PROPERTIES DEPENDS xDeltaMemory )
/*
This file is a part of eXlibris C++ Library
under the GNU Lesser General Public License.
See the NOTICE.md & LICENSE.md files for terms
and conditions.
*/
#include <cstring>
#include <fstream>
#include <iomanip>
#include <iostream>
#include <sstream>
#include <unordered_map>
#include "xDeltaMemory.h"
#include "xDeltaTime.h"
using namespace std;
#include "xMemoryMonitor.h"
#define NBL 10
void foo(xtool::xDeltaTime &dt, xtool::xDeltaMemory &dm, xMemoryMonitor &mm, std::ofstream &out_ref, int proc_id)
{
int iddti0 = dt.initAccu("in loop cost DM");
int iddti1 = dt.initAccu("out of loop cost MM");
int iddmi0 = dm.initAccu("sum rand alloc");
int iddmi1 = dm.initAccu("dealloc");
int iddmi2 = dm.initAccu("a small chunck in foo");
int iddmi3 = dm.initAccu("alloc+dealloc");
dm.startAccu(iddmi3);
size_t k = 0;
double *pointers[NBL];
std::cout << "========================================" << std::endl;
std::cout << "Before alloc loop" << std::endl;
std::cout << "========================================" << std::endl;
dt.startAccu(iddti1);
int idmmi0 = mm.start("sum rand alloc");
dt.endAccu(iddti1);
for (size_t i = 0; i < NBL; ++i)
{
dt.startAccu(iddti0);
dm.startAccu(iddmi0);
dt.endAccu(iddti0);
size_t s = rand() % (5000000 * (i + 1)) + 1;
k += s;
cout << "Allocate " << i << "th chunck (B): " << s * 8 << endl;
pointers[i] = new double[s];
std::fill(pointers[i], pointers[i] + s, 3.);
std::cout << "========================================" << std::endl;
dt.startAccu(iddti0);
dm.endAccu(iddmi0);
dt.endAccu(iddti0);
}
dt.startAccu(iddti1);
mm.end(idmmi0);
dt.endAccu(iddti1);
std::cout << "After alloc loop" << std::endl;
std::cout << "========================================" << std::endl;
std::cout << "Pick allocation (B): " << k * 8 << std::endl;
std::cout << "========================================" << std::endl;
std::cout << "Memory leak introduced by not freeing last allocated block" << std::endl;
std::cout << "========================================" << std::endl;
dt.startAccu(iddti1);
idmmi0 = mm.start("dealloc");
dt.endAccu(iddti1);
for (size_t i = 0; i < NBL - 1; ++i)
{
dt.startAccu(iddti0);
dm.startAccu(iddmi1);
dt.endAccu(iddti0);
cout << "Deallocate " << i << "th chunck" << endl;
delete[] pointers[i];
std::cout << "========================================" << std::endl;
dt.startAccu(iddti0);
dm.endAccu(iddmi1);
dt.endAccu(iddti0);
}
dm.endAccu(iddmi3);
dt.startAccu(iddti1);
mm.end(idmmi0);
dt.endAccu(iddti1);
std::cout << "========================================" << std::endl;
std::cout << "A small chunck (32B) in foo (leak)" << std::endl;
idmmi0 = mm.start("a small chunck in foo");
dm.startAccu(iddmi2);
double *i = new double[4];
std::fill(i, i + 4, 4.);
dm.endAccu(iddmi2);
mm.end(idmmi0);
std::cout << "========================================" << std::endl;
double mx, mi, me, su;
double val = dm.get(iddmi0, mi, mx, me, su);
if (proc_id)
out_ref << "Retriving data for 'sum rand alloc' (GB) " << val / 1073741824. << std::endl;
else
out_ref << "Retriving data for 'sum rand alloc' (GB) " << val / 1073741824. << " min/max/mean/sum " << mi / 1073741824.
<< " " << mx / 1073741824. << " " << me / 1073741824. << " " << su / 1073741824. << std::endl;
out_ref << "========================================" << std::endl;
val = dm.get(iddmi1, mi, mx, me, su);
if (proc_id)
out_ref << "Retriving data for 'dealloc' (GB) " << val / 1073741824. << std::endl;
else
out_ref << "Retriving data for 'dealloc' (GB) " << val / 1073741824. << " min/max/mean/sum " << mi / 1073741824. << " "
<< mx / 1073741824. << " " << me / 1073741824. << " " << su / 1073741824. << std::endl;
out_ref << "========================================" << std::endl;
val = dm.get(iddmi2, mi, mx, me, su);
if (proc_id)
out_ref << "Retriving data for 'a small chunck in foo' (B) " << val << std::endl;
else
out_ref << "Retriving data for 'a small chunck in foo' (B) " << val << " min/max/mean/sum " << mi << " " << mx << " " << me
<< " " << su << std::endl;
out_ref << "========================================" << std::endl;
val = dm.get(iddmi3, mi, mx, me, su);
if (proc_id)
out_ref << "Retriving data for 'alloc+dealloc' (MB) " << val / 1048576. << std::endl;
else
out_ref << "Retriving data for 'alloc+dealloc' (MB) " << val / 1048576. << " min/max/mean/sum " << mi / 1048576. << " "
<< mx / 1048576. << " " << me / 1048576. << " " << su / 1048576. << std::endl;
out_ref << "========================================" << std::endl;
return;
}
int main(int argc, char *argv[])
{
MPI_Init(&argc, &argv);
int proc_id;
MPI_Comm_rank(MPI_COMM_WORLD, &proc_id);
srand((proc_id + 1) * 37);
string no = "proc_" + std::to_string(proc_id) + "_output.txt";
freopen(no.c_str(), "w", stdout);
std::ofstream out_ref;
string noo = "reference_" + std::to_string(proc_id) + ".txt";
out_ref.open(noo.c_str());
out_ref << fixed << std::setprecision(2);
std::cout << "==Start ===============================" << std::endl;
xtool::xDeltaTime dt;
xtool::xDeltaMemory dm;
xMemoryMonitor mm;
int iddm = dm.initAccu("foo");
int iddmm = mm.start("foo");
dm.startAccu(iddm);
foo(dt, dm, mm, out_ref, proc_id);
dm.endAccu(iddm);
mm.end(iddmm);
std::cout << "========================================" << std::endl;
double mx, mi, me, su;
double val = dm.get(iddm, mi, mx, me, su);
const int small_chunck = 10;
if (!proc_id)
{
/* During instalation some small variations have been observed depending on the
* way test are launched. Not clear to me why ?! To avoid prb
* remove from reference
out_ref << "Retriving data for foo (MB) " << val / 1048576. << " min/max/mean/sum " << mi / 1048576. << " " << mx / 1048576.
<< " " << me / 1048576. << " " << su / 1048576. << std::endl;
*/
std::cout << "========================================" << std::endl;
iddm = dm.initAccu("a small chunck i", true);
std::cout << "========================================" << std::endl;
std::cout << "A small chunck i (" << small_chunck * 8 << "B) in main P0 (leak)" << std::endl;
dm.startAccu(iddm);
double *i = new double[small_chunck];
std::fill(i, i + small_chunck, 7.2);
dm.endAccu(iddm);
out_ref << "========================================" << std::endl;
out_ref << "Retriving data for small chunck i " << dm.get(iddm) << "B" << std::endl;
std::cout << "========================================" << std::endl;
std::cout << "Another small chunck k (" << small_chunck * 80 << "B) in main P0 (leak)" << std::endl;
iddmm = mm.start("another small chunck k");
double *k = new double[small_chunck * 10];
std::fill(k, k + small_chunck * 10, 4.2);
mm.end(iddmm);
}
if (proc_id)
{
out_ref << "Retriving data for foo (MB) " << val / 1048576. << std::endl;
std::cout << "========================================" << std::endl;
std::cout << "A small chunck j (" << small_chunck * 2 << "B) in main Px (leak)" << std::endl;
iddm = dm.initAccu("A small chunck j", true);
dm.startAccu(iddm);
double *j = new double[small_chunck * 2];
j[small_chunck - 1] = 3.2;
dm.endAccu(iddm);
out_ref << "========================================" << std::endl;
out_ref << "Retriving data for small chunck j " << dm.get(iddm) << "B" << std::endl;
std::cout << "========================================" << std::endl;
std::cout << "Another small chunck k (" << small_chunck * 8000 << "B) in main Px (leak)" << std::endl;
iddm = dm.initAccu("Another small chunck k", true);
iddmm = mm.start("Another small chunck k");
dm.startAccu(iddm);
double *k = new double[small_chunck * 1000];
std::fill(k, k + small_chunck * 1000, 4.2);
dm.endAccu(iddm);
mm.end(iddmm);
std::cout << "========================================" << std::endl;
out_ref << "========================================" << std::endl;
out_ref << "Retriving data for small chunck k " << dm.get(iddm) / 1024. << "KB" << std::endl;
out_ref << "========================================" << std::endl;
}
MPI_Barrier(MPI_COMM_WORLD);
dm.print();