在malloc中,为什么要完全使用brk?为什么不只使用mmap? [英] In malloc, why use brk at all? Why not just use mmap?

查看:102
本文介绍了在malloc中,为什么要完全使用brk?为什么不只使用mmap?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

malloc 的典型实现使用 brk / sbrk 作为从操作系统中获取内存的主要方式.但是,他们还使用 mmap 来获取大量分配的块.使用 brk 而不是 mmap 真的有好处吗?还是仅仅是传统?用 mmap 进行所有操作是否也一样好?

(注意:我在这里可以交替使用 sbrk brk ,因为它们是同一个Linux系统调用 brk 的接口.)


作为参考,以下是一些描述glibc malloc 的文档:

GNU C库参考手册:GNU分配器
https://www.gnu.org/software/libc/manual/html_node/The-GNU-Allocator.html

glibc Wiki:Malloc概述
https://sourceware.org/glibc/wiki/MallocInternals

这些文件所描述的是, sbrk 用于声明小额分配的主要场所, mmap 用于声明次要场所, mmap code>还用于声明大对象(比页面大得多")的空间.

同时使用应用程序堆(用 sbrk 声明)和 mmap 会带来一些不必要的额外复杂性:

已分配的竞技场-主竞技场使用应用程序的堆.其他竞技场使用 mmap 堆.要将块映射到堆,您需要知道哪种情况适用.如果该位为0,则该块来自主区域和主堆.如果此位为1,则该块来自 mmap 的内存,并且可以从该块的地址计算堆的位置.

[Glibc malloc源自 ptmalloc ,其源自 jemalloc 联机帮助页( http://jemalloc.net/jemalloc.3.html )这样说:

传统上,分配器使用 sbrk(2)来获取内存,由于一些原因,该内存是次优的,原因包括竞态条件,增加的碎片以及对最大可用内存的人为限制.如果操作系统支持 sbrk(2),则此分配器将按优先顺序同时使用 mmap(2)和sbrk(2);否则,该分配器将同时使用 mmap(2)和sbrk(2).否则,仅使用 mmap(2).

因此,他们甚至在这里说 sbrk 是次优的,但他们仍然使用它,即使他们已经为编写代码而烦恼了,但没有代码就可以工作.

[jemalloc的编写始于2005年.]

更新:更多地考虑这一点,即关于按优先顺序"的部分.给我询问电话.为什么选择优先顺序?他们是否只是使用 sbrk 作为后备,以防不支持 mmap (或缺少必要的功能),或者该过程有可能进入某种状态?使用 sbrk 而不使用 mmap ?我会查看他们的代码,看看我是否能弄清楚它在做什么.


我之所以问是因为我正在用C实现垃圾收集系统,到目前为止,除了 mmap 之外,我看不到其他任何理由.我想知道是否还有什么我想念的.

(就我而言,我还有一个避免使用 brk 的原因,这是我可能需要在某些时候使用 malloc 的原因.)

解决方案

系统调用 brk()的优点是只有一个数据项可以跟踪内存使用情况,但很高兴也可以直接与堆的总大小有关.

自1975年的Unix V6起,它的形式完全相同.请注意,V6支持65,535字节的用户地址空间.因此,对于管理超过64K(肯定不是TB)的问题,并没有太多的想法.

使用 mmap 似乎是合理的,直到我开始怀疑更改的或附加的垃圾回收如何使用 mmap ,但不用也会重写分配算法

是否可以与 realloc() fork()等配合使用?

Typical implementations of malloc use brk/sbrk as the primary means of claiming memory from the OS. However, they also use mmap to get chunks for large allocations. Is there a real benefit to using brk instead of mmap, or is it just tradition? Wouldn't it work just as well to do it all with mmap?

(Note: I use sbrk and brk interchangeably here because they are interfaces to the same Linux system call, brk.)


For reference, here are a couple of documents describing the glibc malloc:

GNU C Library Reference Manual: The GNU Allocator
https://www.gnu.org/software/libc/manual/html_node/The-GNU-Allocator.html

glibc wiki: Overview of Malloc
https://sourceware.org/glibc/wiki/MallocInternals

What these documents describe is that sbrk is used to claim a primary arena for small allocations, mmap is used to claim secondary arenas, and mmap is also used to claim space for large objects ("much larger than a page").

The use of both the application heap (claimed with sbrk) and mmap introduces some additional complexity that might be unnecessary:

Allocated Arena - the main arena uses the application's heap. Other arenas use mmap'd heaps. To map a chunk to a heap, you need to know which case applies. If this bit is 0, the chunk comes from the main arena and the main heap. If this bit is 1, the chunk comes from mmap'd memory and the location of the heap can be computed from the chunk's address.

[Glibc malloc is derived from ptmalloc, which was derived from dlmalloc, which was started in 1987.]


The jemalloc manpage (http://jemalloc.net/jemalloc.3.html) has this to say:

Traditionally, allocators have used sbrk(2) to obtain memory, which is suboptimal for several reasons, including race conditions, increased fragmentation, and artificial limitations on maximum usable memory. If sbrk(2) is supported by the operating system, this allocator uses both mmap(2) and sbrk(2), in that order of preference; otherwise only mmap(2) is used.

So, they even say here that sbrk is suboptimal but they use it anyway, even though they've already gone to the trouble of writing their code so that it works without it.

[Writing of jemalloc started in 2005.]

UPDATE: Thinking about this more, that bit about "in order of preference" gives me a line on inquiry. Why the order of preference? Are they just using sbrk as a fallback in case mmap is not supported (or lacks necessary features), or is it possible for the process to get into some state where it can use sbrk but not mmap? I'll look at their code and see if I can figure out what it's doing.


I'm asking because I'm implementing a garbage collection system in C, and so far I see no reason to use anything besides mmap. I'm wondering if there's something I'm missing, though.

(In my case I have an additional reason to avoid brk, which is that I might need to use malloc at some point.)

解决方案

The system call brk() has the advantage of having only a single data item to track memory use, which happily is also directly related to the total size of the heap.

This has been in the exact same form since 1975's Unix V6. Mind you, V6 supported a user address space of 65,535 bytes. So there wasn't a lot of thought given for managing much more than 64K, certainly not terabytes.

Using mmap seems reasonable until I start wondering how altered or added-on garbage collection could use mmap but without rewriting the allocation algorithm too.

Will that work nicely with realloc(), fork(), etc.?

这篇关于在malloc中,为什么要完全使用brk?为什么不只使用mmap?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆