如何知道使用哪个malloc? [英] How to know which malloc is used?

查看:168
本文介绍了如何知道使用哪个malloc?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我理解的方式,存在许多不同的malloc实现:

The way I understand it, there exist many different malloc implementations:


  • dlmalloc-通用分配器

  • ptmalloc2 - glibc

  • jemalloc - FreeBSD和Firefox

  • tcmalloc - Google

  • Solaris

  • dlmalloc – General purpose allocator
  • ptmalloc2 – glibc
  • jemalloc – FreeBSD and Firefox
  • tcmalloc – Google
  • libumem – Solaris

有没有办法确定我的(linux)系统上实际使用了哪个malloc?

Is there any way to determine which malloc is actually used on my (linux) system?

我读到由于ptmalloc2的线程支持,它成为linux的默认内存分配器。

I read that "due to ptmalloc2’s threading support, it became the default memory allocator for linux." Is there any way for me to check this myself?

我问的是因为我似乎没有得到任何加速通过paralellizing我的malloc循环在下面的代码: / p>

I am asking because I do not seem to get any speed up by paralellizing my malloc loop in the code below:

for (int i = 1; i <= 16; i += 1 ) {
    parallelMalloc(i);
}

 void parallelMalloc(int parallelism, int mallocCnt = 10000000) {

    omp_set_num_threads(parallelism);

    std::vector<char*> ptrStore(mallocCnt);

    boost::posix_time::ptime t1 = boost::posix_time::microsec_clock::local_time();

    #pragma omp parallel for
    for (int i = 0; i < mallocCnt; i++) {
        ptrStore[i] = ((char*)malloc(100 * sizeof(char)));
    }

    boost::posix_time::ptime t2 = boost::posix_time::microsec_clock::local_time();

    #pragma omp parallel for
    for (int i = 0; i < mallocCnt; i++) {
        free(ptrStore[i]);
    }

    boost::posix_time::ptime t3 = boost::posix_time::microsec_clock::local_time();


    boost::posix_time::time_duration malloc_time = t2 - t1;
    boost::posix_time::time_duration free_time   = t3 - t2;

    std::cout << " parallelism = "  << parallelism << "\t itr = " << mallocCnt <<  "\t malloc_time = " <<
            malloc_time.total_milliseconds() << "\t free_time = " << free_time.total_milliseconds() << std::endl;
}

这给出了输出

 parallelism = 1         itr = 10000000  malloc_time = 1225      free_time = 1517
 parallelism = 2         itr = 10000000  malloc_time = 1614      free_time = 1112
 parallelism = 3         itr = 10000000  malloc_time = 1619      free_time = 687
 parallelism = 4         itr = 10000000  malloc_time = 2325      free_time = 620
 parallelism = 5         itr = 10000000  malloc_time = 2233      free_time = 550
 parallelism = 6         itr = 10000000  malloc_time = 2207      free_time = 489
 parallelism = 7         itr = 10000000  malloc_time = 2778      free_time = 398
 parallelism = 8         itr = 10000000  malloc_time = 1813      free_time = 389
 parallelism = 9         itr = 10000000  malloc_time = 1997      free_time = 350
 parallelism = 10        itr = 10000000  malloc_time = 1922      free_time = 291
 parallelism = 11        itr = 10000000  malloc_time = 2480      free_time = 257
 parallelism = 12        itr = 10000000  malloc_time = 1614      free_time = 256
 parallelism = 13        itr = 10000000  malloc_time = 1387      free_time = 289
 parallelism = 14        itr = 10000000  malloc_time = 1481      free_time = 248
 parallelism = 15        itr = 10000000  malloc_time = 1252      free_time = 297
 parallelism = 16        itr = 10000000  malloc_time = 1063      free_time = 281


推荐答案


我读到由于ptmalloc2的线程支持,它成为linux的默认内存分配器。是否有任何方法让我自己检查?

I read that "due to ptmalloc2’s threading support, it became the default memory allocator for linux." Is there any way for me to check this myself?

glibc ptmalloc2 ,这不是最近的发展。不管怎样,执行 getconf GNU_LIBC_VERSION 并不是非常困难,然后交叉检查版本以查看是否使用 ptmalloc2

glibc internally uses ptmalloc2 and this isn't a recent development. Either way, it's not terribly difficult to do getconf GNU_LIBC_VERSION, then cross-check the version to see if ptmalloc2 is used in that version or not, but I'm willing to bet you'd be wasting your time.


我问的是因为我不在似乎通过在下面的代码中并行化我的malloc循环来获得任何加速

I am asking because I do not seem to get any speed up by paralellizing my malloc loop in the code below

将你的示例转换为 MVCE (为了简洁省略代码),并使用 g ++ -Wall -pedantic -O3编译 - pthread -fopenmp g ++ 5.3.1 这里是我的结果。

Turning your example into an MVCE (omitting code here for brevity), and compiling with g++ -Wall -pedantic -O3 -pthread -fopenmp, with g++ 5.3.1 here are my results.

OpenMP:

 parallelism = 1     itr = 10000000  malloc_time = 746   free_time = 263
 parallelism = 2     itr = 10000000  malloc_time = 541   free_time = 267
 parallelism = 3     itr = 10000000  malloc_time = 405   free_time = 259
 parallelism = 4     itr = 10000000  malloc_time = 324   free_time = 221
 parallelism = 5     itr = 10000000  malloc_time = 330   free_time = 242
 parallelism = 6     itr = 10000000  malloc_time = 287   free_time = 244
 parallelism = 7     itr = 10000000  malloc_time = 257   free_time = 226
 parallelism = 8     itr = 10000000  malloc_time = 270   free_time = 225
 parallelism = 9     itr = 10000000  malloc_time = 253   free_time = 225
 parallelism = 10    itr = 10000000  malloc_time = 236   free_time = 226
 parallelism = 11    itr = 10000000  malloc_time = 225   free_time = 239
 parallelism = 12    itr = 10000000  malloc_time = 276   free_time = 258
 parallelism = 13    itr = 10000000  malloc_time = 241   free_time = 228
 parallelism = 14    itr = 10000000  malloc_time = 254   free_time = 225
 parallelism = 15    itr = 10000000  malloc_time = 278   free_time = 272
 parallelism = 16    itr = 10000000  malloc_time = 235   free_time = 220

23.87 user 
2.11 system 
0:10.41 elapsed 
249% CPU

没有OpenMP:

 parallelism = 1     itr = 10000000  malloc_time = 748   free_time = 263
 parallelism = 2     itr = 10000000  malloc_time = 344   free_time = 256
 parallelism = 3     itr = 10000000  malloc_time = 751   free_time = 254
 parallelism = 4     itr = 10000000  malloc_time = 339   free_time = 262
 parallelism = 5     itr = 10000000  malloc_time = 748   free_time = 253
 parallelism = 6     itr = 10000000  malloc_time = 330   free_time = 256
 parallelism = 7     itr = 10000000  malloc_time = 734   free_time = 260
 parallelism = 8     itr = 10000000  malloc_time = 334   free_time = 259
 parallelism = 9     itr = 10000000  malloc_time = 750   free_time = 256
 parallelism = 10    itr = 10000000  malloc_time = 339   free_time = 255
 parallelism = 11    itr = 10000000  malloc_time = 743   free_time = 267
 parallelism = 12    itr = 10000000  malloc_time = 342   free_time = 261
 parallelism = 13    itr = 10000000  malloc_time = 739   free_time = 252
 parallelism = 14    itr = 10000000  malloc_time = 333   free_time = 252
 parallelism = 15    itr = 10000000  malloc_time = 740   free_time = 252
 parallelism = 16    itr = 10000000  malloc_time = 330   free_time = 252

13.38 user 
4.66 system 
0:18.08 elapsed 
99% CPU 

并行性似乎快了大约8秒。还是不相信?好。我继续前进,抓住 dlmalloc ,跑了 make 生成 libmalloc.a 。我的新命令是 g ++ -Wall -pedantic -O3 -pthread -fopenmp -L $ HOME / Development / test / dlmalloc / lib test.cpp -lmalloc

Parallelism seems to be faster by about 8 seconds. Still not convinced? OK. I went ahead and grabbed dlmalloc, ran make to produce libmalloc.a. My new command like is g++ -Wall -pedantic -O3 -pthread -fopenmp -L$HOME/Development/test/dlmalloc/lib test.cpp -lmalloc

使用OpenMP:

parallelism = 1  itr = 10000000  malloc_time = 814   free_time = 277

I CTRL - C 37秒。

I CTRL-C'd after 37 seconds.

没有OpenMP:

 parallelism = 1     itr = 10000000  malloc_time = 772   free_time = 271
 parallelism = 2     itr = 10000000  malloc_time = 780   free_time = 272
 parallelism = 3     itr = 10000000  malloc_time = 783   free_time = 272
 parallelism = 4     itr = 10000000  malloc_time = 792   free_time = 277
 parallelism = 5     itr = 10000000  malloc_time = 813   free_time = 281
 parallelism = 6     itr = 10000000  malloc_time = 800   free_time = 275
 parallelism = 7     itr = 10000000  malloc_time = 795   free_time = 277
 parallelism = 8     itr = 10000000  malloc_time = 790   free_time = 273
 parallelism = 9     itr = 10000000  malloc_time = 788   free_time = 277
 parallelism = 10    itr = 10000000  malloc_time = 784   free_time = 276
 parallelism = 11    itr = 10000000  malloc_time = 786   free_time = 284
 parallelism = 12    itr = 10000000  malloc_time = 807   free_time = 279
 parallelism = 13    itr = 10000000  malloc_time = 791   free_time = 277
 parallelism = 14    itr = 10000000  malloc_time = 790   free_time = 273
 parallelism = 15    itr = 10000000  malloc_time = 785   free_time = 276
 parallelism = 16    itr = 10000000  malloc_time = 787   free_time = 275

6.48 user 
11.27 system 
0:17.81 elapsed 
99% CPU

相当显着的差异。我怀疑这个问题在于您更复杂的代码,或者您的基准测试出错了。

Pretty significant difference. I suspect that the issue lies within your more complicated code, or something's wrong with your benchmark.

这篇关于如何知道使用哪个malloc?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆