如何知道使用哪个malloc? [英] How to know which malloc is used?
问题描述
我理解的方式,存在许多不同的malloc实现:
The way I understand it, there exist many different malloc implementations:
- dlmalloc-通用分配器
- ptmalloc2 - glibc
- jemalloc - FreeBSD和Firefox
- tcmalloc - Google
- Solaris
- dlmalloc – General purpose allocator
- ptmalloc2 – glibc
- jemalloc – FreeBSD and Firefox
- tcmalloc – Google
- libumem – Solaris
有没有办法确定我的(linux)系统上实际使用了哪个malloc?
Is there any way to determine which malloc is actually used on my (linux) system?
我读到由于ptmalloc2的线程支持,它成为linux的默认内存分配器。
I read that "due to ptmalloc2’s threading support, it became the default memory allocator for linux." Is there any way for me to check this myself?
我问的是因为我似乎没有得到任何加速通过paralellizing我的malloc循环在下面的代码: / p>
I am asking because I do not seem to get any speed up by paralellizing my malloc loop in the code below:
for (int i = 1; i <= 16; i += 1 ) {
parallelMalloc(i);
}
void parallelMalloc(int parallelism, int mallocCnt = 10000000) {
omp_set_num_threads(parallelism);
std::vector<char*> ptrStore(mallocCnt);
boost::posix_time::ptime t1 = boost::posix_time::microsec_clock::local_time();
#pragma omp parallel for
for (int i = 0; i < mallocCnt; i++) {
ptrStore[i] = ((char*)malloc(100 * sizeof(char)));
}
boost::posix_time::ptime t2 = boost::posix_time::microsec_clock::local_time();
#pragma omp parallel for
for (int i = 0; i < mallocCnt; i++) {
free(ptrStore[i]);
}
boost::posix_time::ptime t3 = boost::posix_time::microsec_clock::local_time();
boost::posix_time::time_duration malloc_time = t2 - t1;
boost::posix_time::time_duration free_time = t3 - t2;
std::cout << " parallelism = " << parallelism << "\t itr = " << mallocCnt << "\t malloc_time = " <<
malloc_time.total_milliseconds() << "\t free_time = " << free_time.total_milliseconds() << std::endl;
}
这给出了输出
parallelism = 1 itr = 10000000 malloc_time = 1225 free_time = 1517
parallelism = 2 itr = 10000000 malloc_time = 1614 free_time = 1112
parallelism = 3 itr = 10000000 malloc_time = 1619 free_time = 687
parallelism = 4 itr = 10000000 malloc_time = 2325 free_time = 620
parallelism = 5 itr = 10000000 malloc_time = 2233 free_time = 550
parallelism = 6 itr = 10000000 malloc_time = 2207 free_time = 489
parallelism = 7 itr = 10000000 malloc_time = 2778 free_time = 398
parallelism = 8 itr = 10000000 malloc_time = 1813 free_time = 389
parallelism = 9 itr = 10000000 malloc_time = 1997 free_time = 350
parallelism = 10 itr = 10000000 malloc_time = 1922 free_time = 291
parallelism = 11 itr = 10000000 malloc_time = 2480 free_time = 257
parallelism = 12 itr = 10000000 malloc_time = 1614 free_time = 256
parallelism = 13 itr = 10000000 malloc_time = 1387 free_time = 289
parallelism = 14 itr = 10000000 malloc_time = 1481 free_time = 248
parallelism = 15 itr = 10000000 malloc_time = 1252 free_time = 297
parallelism = 16 itr = 10000000 malloc_time = 1063 free_time = 281
推荐答案
我读到由于ptmalloc2的线程支持,它成为linux的默认内存分配器。是否有任何方法让我自己检查?
I read that "due to ptmalloc2’s threading support, it became the default memory allocator for linux." Is there any way for me to check this myself?
glibc
ptmalloc2
,这不是最近的发展。不管怎样,执行 getconf GNU_LIBC_VERSION
并不是非常困难,然后交叉检查版本以查看是否使用 ptmalloc2
glibc
internally uses ptmalloc2
and this isn't a recent development. Either way, it's not terribly difficult to do getconf GNU_LIBC_VERSION
, then cross-check the version to see if ptmalloc2
is used in that version or not, but I'm willing to bet you'd be wasting your time.
我问的是因为我不在似乎通过在下面的代码中并行化我的malloc循环来获得任何加速
I am asking because I do not seem to get any speed up by paralellizing my malloc loop in the code below
将你的示例转换为 MVCE (为了简洁省略代码),并使用 g ++ -Wall -pedantic -O3编译 - pthread -fopenmp
与 g ++ 5.3.1
这里是我的结果。
Turning your example into an MVCE (omitting code here for brevity), and compiling with g++ -Wall -pedantic -O3 -pthread -fopenmp
, with g++ 5.3.1
here are my results.
OpenMP:
parallelism = 1 itr = 10000000 malloc_time = 746 free_time = 263
parallelism = 2 itr = 10000000 malloc_time = 541 free_time = 267
parallelism = 3 itr = 10000000 malloc_time = 405 free_time = 259
parallelism = 4 itr = 10000000 malloc_time = 324 free_time = 221
parallelism = 5 itr = 10000000 malloc_time = 330 free_time = 242
parallelism = 6 itr = 10000000 malloc_time = 287 free_time = 244
parallelism = 7 itr = 10000000 malloc_time = 257 free_time = 226
parallelism = 8 itr = 10000000 malloc_time = 270 free_time = 225
parallelism = 9 itr = 10000000 malloc_time = 253 free_time = 225
parallelism = 10 itr = 10000000 malloc_time = 236 free_time = 226
parallelism = 11 itr = 10000000 malloc_time = 225 free_time = 239
parallelism = 12 itr = 10000000 malloc_time = 276 free_time = 258
parallelism = 13 itr = 10000000 malloc_time = 241 free_time = 228
parallelism = 14 itr = 10000000 malloc_time = 254 free_time = 225
parallelism = 15 itr = 10000000 malloc_time = 278 free_time = 272
parallelism = 16 itr = 10000000 malloc_time = 235 free_time = 220
23.87 user
2.11 system
0:10.41 elapsed
249% CPU
没有OpenMP:
parallelism = 1 itr = 10000000 malloc_time = 748 free_time = 263
parallelism = 2 itr = 10000000 malloc_time = 344 free_time = 256
parallelism = 3 itr = 10000000 malloc_time = 751 free_time = 254
parallelism = 4 itr = 10000000 malloc_time = 339 free_time = 262
parallelism = 5 itr = 10000000 malloc_time = 748 free_time = 253
parallelism = 6 itr = 10000000 malloc_time = 330 free_time = 256
parallelism = 7 itr = 10000000 malloc_time = 734 free_time = 260
parallelism = 8 itr = 10000000 malloc_time = 334 free_time = 259
parallelism = 9 itr = 10000000 malloc_time = 750 free_time = 256
parallelism = 10 itr = 10000000 malloc_time = 339 free_time = 255
parallelism = 11 itr = 10000000 malloc_time = 743 free_time = 267
parallelism = 12 itr = 10000000 malloc_time = 342 free_time = 261
parallelism = 13 itr = 10000000 malloc_time = 739 free_time = 252
parallelism = 14 itr = 10000000 malloc_time = 333 free_time = 252
parallelism = 15 itr = 10000000 malloc_time = 740 free_time = 252
parallelism = 16 itr = 10000000 malloc_time = 330 free_time = 252
13.38 user
4.66 system
0:18.08 elapsed
99% CPU
并行性似乎快了大约8秒。还是不相信?好。我继续前进,抓住 dlmalloc
,跑了 make
生成 libmalloc.a
。我的新命令是 g ++ -Wall -pedantic -O3 -pthread -fopenmp -L $ HOME / Development / test / dlmalloc / lib test.cpp -lmalloc
Parallelism seems to be faster by about 8 seconds. Still not convinced? OK. I went ahead and grabbed dlmalloc
, ran make
to produce libmalloc.a
. My new command like is g++ -Wall -pedantic -O3 -pthread -fopenmp -L$HOME/Development/test/dlmalloc/lib test.cpp -lmalloc
使用OpenMP:
parallelism = 1 itr = 10000000 malloc_time = 814 free_time = 277
I CTRL - C 37秒。
I CTRL-C'd after 37 seconds.
没有OpenMP:
parallelism = 1 itr = 10000000 malloc_time = 772 free_time = 271
parallelism = 2 itr = 10000000 malloc_time = 780 free_time = 272
parallelism = 3 itr = 10000000 malloc_time = 783 free_time = 272
parallelism = 4 itr = 10000000 malloc_time = 792 free_time = 277
parallelism = 5 itr = 10000000 malloc_time = 813 free_time = 281
parallelism = 6 itr = 10000000 malloc_time = 800 free_time = 275
parallelism = 7 itr = 10000000 malloc_time = 795 free_time = 277
parallelism = 8 itr = 10000000 malloc_time = 790 free_time = 273
parallelism = 9 itr = 10000000 malloc_time = 788 free_time = 277
parallelism = 10 itr = 10000000 malloc_time = 784 free_time = 276
parallelism = 11 itr = 10000000 malloc_time = 786 free_time = 284
parallelism = 12 itr = 10000000 malloc_time = 807 free_time = 279
parallelism = 13 itr = 10000000 malloc_time = 791 free_time = 277
parallelism = 14 itr = 10000000 malloc_time = 790 free_time = 273
parallelism = 15 itr = 10000000 malloc_time = 785 free_time = 276
parallelism = 16 itr = 10000000 malloc_time = 787 free_time = 275
6.48 user
11.27 system
0:17.81 elapsed
99% CPU
相当显着的差异。我怀疑这个问题在于您更复杂的代码,或者您的基准测试出错了。
Pretty significant difference. I suspect that the issue lies within your more complicated code, or something's wrong with your benchmark.
这篇关于如何知道使用哪个malloc?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!