malloc_trim(0)释放线程竞技场的Fastbins? [英] malloc_trim(0) Releases Fastbins of Thread Arenas?

查看:122
本文介绍了malloc_trim(0)释放线程竞技场的Fastbins?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在过去一周左右的时间内,我一直在研究内存使用量随时间累积的应用程序中的问题.我将其范围缩小到可以复制

For the last week or so I've been investigating a problem in an application where the memory usage accumulates over time. I narrowed it down to a line that copies a

std::vector< std::vector< std::vector< std::map< uint, map< uint, std::bitset< N> > > > > >

在工作线程中(我意识到这是组织内存的荒谬方式).通常,工作线程会被破坏,重新创建,并在启动时由该线程复制该内存结构.复制的原始数据通过引用从主线程传递到工作线程.

in a worker thread (I realize this is a ridiculous way to organize memory). On a regular basis, the worker thread is destroyed, recreated, and that memory structure copied by the thread when it starts. The original data that gets copied is passed to the worker thread by reference from the main thread.

使用malloc_stat和malloc_info,我可以看到当工作线程被销毁时,它正在使用的竞技场/堆会将其用于该结构的内存保留在其Fastbin的空闲列表中.这是有道理的,因为有许多单独的分配少于64个字节.

Using malloc_stat and malloc_info, I can see that when the worker thread is destroyed, the arena/heap it was using retains the memory used for that structure in its free list of fastbins. This make sense, since there are many individual allocations less than 64 bytes.

问题是,当重新创建工作线程时,它将创建一个新的竞技场/堆,而不是重新使用前一个竞技场/堆,这样就不会重复使用来自先前竞技场/堆的快速bin.最终,系统会在重新使用先前的堆/区域以重用它们所持有的快速存储区之前耗尽内存.

The problem is, when the worker thread is recreated, it creates a new arena/heap instead of reusing the previous one, such that the fastbins from previous arenas/heaps are never reused. Eventually the system runs out of memory before reusing a previous heap/arena to reuse the fastbins they're holding onto.

有些偶然,我发现在加入工作线程之后,在主线程中调用malloc_trim(0)会导致释放线程舞台/堆中的fastbin.据我所知,这种行为是没有记载的.有人有解释吗?

Somewhat by accident, I discovered that calling malloc_trim(0) in my main thread, after joining the worker thread, causes the fastbins in the thread arenas/heaps to be released. This behavior is undocumented as far as I can see. Does anyone have an explanation?

以下是一些我用来查看此行为的测试代码:

Here is some test code I'm using to see this behavior:

// includes
#include <stdio.h>
#include <algorithm>
#include <vector>
#include <iostream>
#include <stdexcept>
#include <stdio.h>
#include <string>
#include <mcheck.h>
#include <malloc.h>
#include <map>
#include <bitset>
#include <boost/thread.hpp>
#include <boost/shared_ptr.hpp>

// Number of bits per bitset.
const int sizeOfBitsets = 40;

// Executes a system command. Used to get output of "free -m".
std::string ExecuteSystemCommand(const char* cmd) {
    char buffer[128];
    std::string result = "";
    FILE* pipe = popen(cmd, "r");
    if (!pipe) throw std::runtime_error("popen() failed!");
    try {
        while (!feof(pipe)) {
            if (fgets(buffer, 128, pipe) != NULL)
                result += buffer;
        }
    } catch (...) {
        pclose(pipe);
        throw;
    }
    pclose(pipe);
    return result;
}

// Prints output of "free -m" and output of malloc_stat().
void PrintMemoryStats()
{
    try
    {
        char *buf;
        size_t size;
        FILE *fp;

        std::string myCommand("free -m");
        std::string result = ExecuteSystemCommand(myCommand.c_str());
        printf("Free memory is \n%s\n", result.c_str());

        malloc_stats();

        fp = open_memstream(&buf, &size);
        malloc_info(0, fp);
        fclose(fp);
        printf("# Memory Allocation Stats\n%s\n#> ", buf);
        free(buf);

    }
    catch(...)
    {
        printf("Unable to print memory stats.\n");
        throw;
    }
}

void MakeCopies(std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > >& data)
{
    try
    {
        // Create copies.
        std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > > dataCopyA(data);
        std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > > dataCopyB(data);
        std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > > dataCopyC(data);

        // Print memory info.
        printf("Memory after creating data copies:\n");
        PrintMemoryStats();
    }
    catch(...)
    {
        printf("Unable to make copies.");
        throw;
    }
}

int main(int argc, char** argv)
{
    try
    {
          // When uncommented, disables the use of fastbins.
//        mallopt(M_MXFAST, 0);

        // Print memory info.
        printf("Memory to start is:\n");
        PrintMemoryStats();

        // Sizes of original data.
        int sizeOfDataA = 2048;
        int sizeOfDataB = 4;
        int sizeOfDataC = 128;
        int sizeOfDataD = 20;
        std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > > testData;

        // Populate data.
        testData.resize(sizeOfDataA);
        for(int a = 0; a < sizeOfDataA; ++a)
        {
            testData.at(a).resize(sizeOfDataB);
            for(int b = 0; b < sizeOfDataB; ++b)
            {
                for(int c = 0; c < sizeOfDataC; ++c)
                {
                    std::map<uint, std::bitset<sizeOfBitsets> > dataMap;
                    testData.at(a).at(b).insert(std::pair<uint, std::map<uint, std::bitset<sizeOfBitsets> > >(c, dataMap));
                    for(int d = 0; d < sizeOfDataD; ++d)
                    {
                        std::bitset<sizeOfBitsets> testBitset;
                        testData.at(a).at(b).at(c).insert(std::pair<uint, std::bitset<sizeOfBitsets> >(d, testBitset));
                    }
                }
            }
        }

        // Print memory info.
        printf("Memory to after creating original data is:\n");
        PrintMemoryStats();

        // Start thread to make copies and wait to join.
        {
            boost::shared_ptr<boost::thread> makeCopiesThread = boost::shared_ptr<boost::thread>(new boost::thread(&MakeCopies, boost::ref(testData)));
            makeCopiesThread->join();
        }

        // Print memory info.
        printf("Memory to after joining thread is:\n");
        PrintMemoryStats();

        malloc_trim(0);

        // Print memory info.
        printf("Memory to after malloc_trim(0) is:\n");
        PrintMemoryStats();

        return 0;

    }
    catch(...)
    {
        // Log warning.
        printf("Unable to run application.");

        // Return failure.
        return 1;
    }

    // Return success.
    return 0;
}

在malloc trim调用之前和之后的有趣输出是(查找"LOOK HERE!"):

The interesting output from before and after the malloc trim call is (look for "LOOK HERE!"):

#> Memory to after joining thread is:
Free memory is
              total        used        free      shared  buff/cache   available
Mem:         257676        7361      246396          25        3918      249757
Swap:          1023           0        1023

Arena 0:
system bytes     = 1443450880
in use bytes     = 1443316976
Arena 1:
system bytes     =   35000320
in use bytes     =       6608
Total (incl. mmap):
system bytes     = 1478451200
in use bytes     = 1443323584
max mmap regions =          0
max mmap bytes   =          0
# Memory Allocation Stats
<malloc version="1">
<heap nr="0">
<sizes>
<size from="241" to="241" total="241" count="1"/>
<size from="529" to="529" total="529" count="1"/>
</sizes>
<total type="fast" count="0" size="0"/>
<total type="rest" count="2" size="770"/>
<system type="current" size="1443450880"/>
<system type="max" size="1443459072"/>
<aspace type="total" size="1443450880"/>
<aspace type="mprotect" size="1443450880"/>
</heap>
<heap nr="1">
<sizes>
<size from="33" to="48" total="48" count="1"/>
<size from="49" to="64" total="4026531712" count="62914558"/> <-- LOOK HERE!
<size from="65" to="80" total="160" count="2"/>
<size from="81" to="96" total="301989888" count="3145728"/> <-- LOOK HERE!
<size from="33" to="33" total="231" count="7"/>
<size from="49" to="49" total="1274" count="26"/>
<unsorted from="0" to="49377" total="1431600" count="6144"/>
</sizes>
<total type="fast" count="66060289" size="4328521808"/>
<total type="rest" count="6177" size="1433105"/>
<system type="current" size="4329967616"/>
<system type="max" size="4329967616"/>
<aspace type="total" size="35000320"/>
<aspace type="mprotect" size="35000320"/>
</heap>
<total type="fast" count="66060289" size="4328521808"/>
<total type="rest" count="6179" size="1433875"/>
<total type="mmap" count="0" size="0"/>
<system type="current" size="5773418496"/>
<system type="max" size="5773426688"/>
<aspace type="total" size="1478451200"/>
<aspace type="mprotect" size="1478451200"/>
</malloc>

#> Memory to after malloc_trim(0) is:
Free memory is
              total        used        free      shared  buff/cache   available
Mem:         257676        3269      250488          25        3918      253850
Swap:          1023           0        1023

Arena 0:
system bytes     = 1443319808
in use bytes     = 1443316976
Arena 1:
system bytes     =   35000320
in use bytes     =       6608
Total (incl. mmap):
system bytes     = 1478320128
in use bytes     = 1443323584
max mmap regions =          0
max mmap bytes   =          0
# Memory Allocation Stats
<malloc version="1">
<heap nr="0">
<sizes>
<size from="209" to="209" total="209" count="1"/>
<size from="529" to="529" total="529" count="1"/>
<unsorted from="0" to="49377" total="1431600" count="6144"/>
</sizes>
<total type="fast" count="0" size="0"/>
<total type="rest" count="6146" size="1432338"/>
<system type="current" size="1443459072"/>
<system type="max" size="1443459072"/>
<aspace type="total" size="1443459072"/>
<aspace type="mprotect" size="1443459072"/>
</heap>
<heap nr="1"> <---------------------------------------- LOOK HERE!
<sizes> <-- HERE!
<unsorted from="0" to="67108801" total="4296392384" count="6208"/>
</sizes>
<total type="fast" count="0" size="0"/>
<total type="rest" count="6208" size="4296392384"/>
<system type="current" size="4329967616"/>
<system type="max" size="4329967616"/>
<aspace type="total" size="35000320"/>
<aspace type="mprotect" size="35000320"/>
</heap>
<total type="fast" count="0" size="0"/>
<total type="rest" count="12354" size="4297824722"/>
<total type="mmap" count="0" size="0"/>
<system type="current" size="5773426688"/>
<system type="max" size="5773426688"/>
<aspace type="total" size="1478459392"/>
<aspace type="mprotect" size="1478459392"/>
</malloc>

#>

关于malloc_info输出的文档很少甚至没有文档,因此我不确定我指出的那些输出是否真的是快速的bin.为了验证它们确实是fastbins,我取消了代码行的注释

There is little to no documentation on the output of malloc_info, so I wasn't sure if those outputs I pointed out were really fast bins. To verify that they are indeed fastbins, I uncomment the code line

mallopt(M_MXFAST, 0);

在调用malloc_trim(0)之前,在调用线程之前禁用线程在加入线程之后使用fastbins和堆1的内存使用情况,就像在调用malloc_trim(0)之后启用fastbins一样.最重要的是,禁用快速bin的使用将在线程加入后立即将内存返回给系统.在启用了fastbins的情况下加入线程后,调用malloc_trim(0)也会将内存返回给系统.

to disable the use of fastbins and the memory usage for heap 1 after joining the thread, before calling malloc_trim(0), looks like it does in with fastbins enabled, after calling malloc_trim(0). Most importantly, disabling the use of fastbins returns the memory to the system immediately after the thread is joined. Calling malloc_trim(0), after joining the thread with fastbins enabled, also returns memory to the system.

malloc_trim(0)的文档指出,它只能从主竞技场堆的顶部释放内存,那么这是怎么回事?我在带有glibc版本2.17的CentOS 7上运行.

The documentation for malloc_trim(0) states that it can only free memory from the top of the main arena heap, so what is going on here? I'm running on CentOS 7 with glibc version 2.17.

推荐答案

malloc_trim(0)指出它只能从主竞技场堆的顶部释放内存,那么这是怎么回事?

malloc_trim(0) states that it can only free memory from the top of the main arena heap, so what is going on here?

它可以称为过时"或不正确"的文档. Glibc没有 malloc_trim函数的文档; Linux使用手册页项目中的手册页. malloc_trim的手册页 http://man7.org/linux /man-pages/man3/malloc_trim.3.html 写于2012年手册页的维护者作为新的.可能他使用了来自glibc malloc/malloc.c源代码

It can be called "outdated" or "incorrect" documentation. Glibc have no documentation of malloc_trim function; and Linux uses man pages from man-pages project. The man page of malloc_trim http://man7.org/linux/man-pages/man3/malloc_trim.3.html was written in 2012 by maintainer of man-pages as new. Probably he used some comments from glibc malloc/malloc.c source code http://code.metager.de/source/xref/gnu/glibc/malloc/malloc.c#675

676  malloc_trim(size_t pad);
677
678  If possible, gives memory back to the system (via negative
679  arguments to sbrk) if there is unused memory at the `high' end of
680  the malloc pool. You can call this after freeing large blocks of
681  memory to potentially reduce the system-level memory requirements
682  of a program. However, it cannot guarantee to reduce memory. Under
683  some allocation patterns, some large free blocks of memory will be
684  locked between two used chunks, so they cannot be given back to
685  the system.
686
687  The `pad' argument to malloc_trim represents the amount of free
688  trailing space to leave untrimmed. If this argument is zero,
689  only the minimum amount of memory to maintain internal data
690  structures will be left (one page or less). Non-zero arguments
691  can be supplied to maintain enough trailing space to service
692  future expected allocations without having to re-obtain memory
693  from the system.
694
695  Malloc_trim returns 1 if it actually released any memory, else 0.
696  On systems that do not support "negative sbrks", it will always
697  return 0.

glibc中的实际实现是__malloc_trim,它具有用于在竞技场上迭代的代码:

Actual implementation in glibc is __malloc_trim and it has code for iterating over arenas:

http://code.metager.de /source/xref/gnu/glibc/malloc/malloc.c#4552

4552 int
4553 __malloc_trim (size_t s)

4560  mstate ar_ptr = &main_arena;
4561  do
4562    {
4563      (void) mutex_lock (&ar_ptr->mutex);
4564      result |= mtrim (ar_ptr, s);
4565      (void) mutex_unlock (&ar_ptr->mutex);
4566
4567      ar_ptr = ar_ptr->next;
4568    }
4569  while (ar_ptr != &main_arena);

每个竞技场都使用mtrim()(mTRIm())函数进行修剪,该函数调用malloc_consolidate()将fastbins中的所有空闲段(因为它们是快速的,它们不是自由合并的)转换为正常的空闲块(已合并的)相邻的块)

Every arena is trimmed using mtrim() (mTRIm()) function, which calls malloc_consolidate() to convert all free segments from fastbins (they are not coalesced at free as they are fast) to normal free chunks (which are coalesced with adjacent chunks)

4498  /* Ensure initialization/consolidation */
4499  malloc_consolidate (av);

4111  malloc_consolidate is a specialized version of free() that tears
4112  down chunks held in fastbins. 

1581   Fastbins
1591    Chunks in fastbins keep their inuse bit set, so they cannot
1592    be consolidated with other free chunks. malloc_consolidate
1593    releases all chunks in fastbins and consolidates them with
1594    other free chunks.

问题是,当重新创建工作线程时,它会创建一个新的竞技场/堆,而不是重用上一个竞技场/堆,这样就不会重复使用来自先前竞技场/堆的快速bins.

The problem is, when the worker thread is recreated, it creates a new arena/heap instead of reusing the previous one, such that the fastbins from previous arenas/heaps are never reused.

这很奇怪.根据设计,在glibc malloc中,竞技场的最大数量受cpu_core_count * 8(对于64位平台)限制; cpu_core_count * 2(对于32位平台)或环境变量MALLOC_ARENA_MAX/mallopt参数M_ARENA_MAX.

This is strange. By design, maximum number of arenas is limited in glibc malloc by cpu_core_count * 8 (for 64-bit platform); cpu_core_count * 2 (for 32-bit platform) or by environment variable MALLOC_ARENA_MAX / mallopt parameter M_ARENA_MAX.

您可以限制应用程序的竞技场数量;定期调用malloc_trim()或调用大小为大"的malloc()(它将调用malloc_consolidate),然后在破坏之前从线程中依次调用free():

You can limit count of arenas for your application; call malloc_trim() periodically or call to malloc() with "large" size (it will call malloc_consolidate) and then free() for it from your threads just before destroying:

3319 _int_malloc (mstate av, size_t bytes)
3368  if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))
 // fastbin allocation path
3405  if (in_smallbin_range (nb))
 // smallbin path; malloc_consolidate may be called
3437     If this is a large request, consolidate fastbins before continuing.
3438     While it might look excessive to kill all fastbins before
3439     even seeing if there is space available, this avoids
3440     fragmentation problems normally associated with fastbins.
3441     Also, in practice, programs tend to have runs of either small or
3442     large requests, but less often mixtures, so consolidation is not
3443     invoked all that often in most programs. And the programs that
3444     it is called frequently in otherwise tend to fragment.
3445   */
3446
3447  else
3448    {
3449      idx = largebin_index (nb);
3450      if (have_fastchunks (av))
3451        malloc_consolidate (av);
3452    }

PS:malloc_trim的手册页中有评论 https://github .com/mkerrisk/man-pages/commit/a15b0e60b297e29c825b7417582a33e6ca26bf65 :

PS: there is comment in man page of malloc_trim https://github.com/mkerrisk/man-pages/commit/a15b0e60b297e29c825b7417582a33e6ca26bf65:

+.SH NOTES
+This function only releases memory in the main arena.
+.\" malloc/malloc.c::mTRIm():
+.\" return result | (av == &main_arena ? sYSTRIm (pad, av) : 0);

是的,可以检查main_arena,但是它在malloc_trim实现mTRIm()的末尾,并且仅用于调用具有负偏移量的sbrk(). 自2007年以来(glibc 2.9和更新的版本),还有另一种方法将内存返回给操作系统:madvise(MADV_DONTNEED),该方法在所有领域都使用(并且未由glibc补丁的作者或手册页的作者进行记录).每个领域都需要巩固.还有一些代码可以修整(映射)mmap堆的顶部块(从慢路径free()调用heap_trim/shrink_heap),但是没有从malloc_trim调用它.

Yes, there is check for main_arena, but it is at very end of malloc_trim implementation mTRIm() and it is just for calling sbrk() with negative offset. Since 2007 (glibc 2.9 and newer) there is another method to return memory back to the OS: madvise(MADV_DONTNEED) which is used in all arenas (and is not documented by author of glibc patch or author of man page). Consolidate is called for every arena. There is also code for trimming (munmapping) top chunk of mmap-ed heaps (heap_trim/shrink_heap called from slow path free()), but it is not called from malloc_trim.

这篇关于malloc_trim(0)释放线程竞技场的Fastbins?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆