CPython 内存分配 [英] CPython memory allocation

查看:44
本文介绍了CPython 内存分配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一篇受此评论启发的帖子如何在 CPython 中为对象分配内存.最初,这是在创建一个列表并在 for 循环中附加到它的上下文中相对于列表推导式.

这里是我的问题:

  1. CPython 中有多少种不同的分配器?
    • 每个的功能是什么?
  2. malloc 什么时候被真正调用?(基于 此评论
  3. python 在启动时为自己分配了多少内存?

    1. 是否有规则控制哪些数据结构在此内存上首先获得dibs"?

  4. 对象被删除时使用的内存会发生什么变化(python 是否仍然保留内存以在将来分配给另一个对象,或者 GC 是否为另一个进程释放内存,例如 Google Chrome,使用)?
  5. 什么时候触发 GC?
  6. list 是动态数组,这意味着它们需要一块连续的内存.这意味着,如果我尝试将一个对象附加到一个列表中,该列表的底层 C 数据结构数组无法扩展,则该数组将被复制到内存的不同部分,在那里可以使用更大的连续块.那么当我初始化一个列表时,这个数组分配了多少空间呢?
    • 为新数组分配了多少额外空间,它现在保存了旧列表和附加对象?

编辑:根据评论,我认为这里有太多问题.我这样做只是因为这些问题都非常相关.不过,如果是这样的话,我很乐意把它分成几个帖子(请在评论中告诉我这样做)

解决方案

中回答了大部分问题C API 文档的内存管理一章.

有些文档比您要求的更模糊.有关更多详细信息,您必须转向源代码.除非您选择特定版本,否则没有人愿意这样做.(至少 2.7.5、pre-2.7.6、3.3.2、pre-3.3.3 和 pre-3.4 会对不同的人感兴趣.)

obmalloc.c 文件的来源对于您的许多问题,这是一个很好的起点,顶部的评论有一个漂亮的小 ASCII 艺术图:

 特定于对象的分配器_____ ______ ______ ________[ int ] [ dict ] [ list ] ... [ string ] Python core |+3 |<----- 对象特定的内存----->|<-- 非对象内存-->|______________________________ ||[Python 的对象分配器] ||+2 |####### 对象内存####### |<------ 内部缓冲区------>|________________________________________________________________________ |[ Python 的原始内存分配器 (PyMem_ API) ] |+1 |<----- Python 内存(在 PyMem 管理器的控制下)------>||__________________________________________________________________[底层通用分配器(例如:C 库 malloc)]0 |<------为python进程分配的虚拟内存----->|==========================================================================_______________________________________________________________________[特定于操作系统的虚拟内存管理器 (VMM)]-1 |<---内核动态存储分配&管理(基于页面)--->|__________________________________ __________________________________[ ] [ ]-2 |<-- 物理内存:ROM/RAM -->||<-- 二级存储(交换)-->|

<小时><块引用>

CPython 中有多少种不同的分配器?

根据文档,几个".如果您真的需要,您可以计算内置和 stdlib 类型中的类型,然后添加少量通用类型.但我不确定它会告诉你什么.(而且这将是非常特定于版本的.IIRC,确切的数字甚至在 3.3 树中发生了变化,因为有一个关于新样式字符串是否应该使用三个不同的分配器或一个的实验.)

<小时><块引用>

各自的作用是什么?

级别 +3 的特定于对象的分配器用于值得优化的特定用例.正如文档所说:

<块引用>

例如,整数对象在堆中的管理方式与字符串、元组或字典不同,因为整数意味着不同的存储要求和速度/空间权衡.

在此之下,有各种通用的支持级别 +2(和 +1.5 甚至 +2.5)的分配器——至少一个对象分配器、一个 arena 分配器和一个小块分配器等等——但除了首先是私有的实现细节(意味着甚至对于 C-API 也是私有的;显然所有这些都是 Python 代码私有的).

在其下方是原始分配器,其功能是在更高级别的分配器需要时向操作系统请求更多内存.

<小时><块引用>

malloc 什么时候被调用?

原始内存分配器(或其堆管理器)应该是唯一调用 malloc 的东西.(事实上​​,它甚至可能不一定调用 malloc;它可能会使用诸如 mmapVirtualAlloc 之类的函数来代替.但关键是它是唯一要求操作系统提供内存的东西.)在 Python 的核心中有一些例外,但它们很少相关.

文档明确指出,更高级别的代码永远不应该尝试对从 malloc 获得的内存中的 Python 对象进行操作.

然而,有很多 stdlib 和扩展模块使用 malloc 用于除了 Python 对象.

例如,一个 1000x1000 int32 值的 numpy 数组不会分配 100 万个 Python int,因此它不必通过 int 分配器.相反,它只是 malloc 一个包含 100 万个 C int 的数组,并在您访问它们时根据需要将它们包装在 Python 对象中.

<小时><块引用>

python在启动时为自己分配了多少内存?

这是特定于平台的,从代码中很难弄清楚.但是,当我在 64 位 Mac 上启动新的 python3.3 解释器时,它从 13.1MB 的虚拟内存开始,几乎立即扩展到 201MB.所以,这应该是一个粗略的大概指南.

<块引用>

是否有规则控制哪些数据结构在此内存上首先获得dibs"?

不是真的,不是.恶意或有缺陷的特定于对象的分配器可以立即获取所有预先分配的内存以及更多内存,而且没有什么可以阻止它.

<小时><块引用>

当一个对象被删除时,它所使用的内存会发生什么(python 是否仍然保留内存以在将来分配给另一个对象,或者 GC 是否为另一个进程释放了内存,比如谷歌浏览器,使用)?

它返回到特定于对象的分配器,它可以将它保留在空闲列表中,或者将其释放给原始分配器,后者保留自己的空闲列表.原始分配器几乎从不将内存释放回操作系统.

这是因为通常没有充分的理由将内存释放回现代操作系统.如果您有大量未使用的页面,操作系统的 VM 会在另一个进程需要时将它们分页.当有一个很好的理由时,它几乎总是特定于应用程序的,最简单的解决方案是使用多个进程来管理您巨大的短期内存需求.

<小时><块引用>

GC 何时触发?

这取决于您所说的GC"是什么意思.

CPython 使用引用计数;每次释放对对象的引用(通过重新绑定变量或集合中的槽,让变量超出范围等)时,如果它是最后一个引用,它将立即被清除.文档中的引用计数部分对此进行了解释.

但是,引用计数存在一个问题:如果两个对象相互引用,即使所有外部引用都消失了,它们仍然不会被清除.所以,CPython 一直有一个循环收集器,它周期性地遍历对象,寻找相互引用但没有外部引用的对象的循环.(这有点复杂,但这是基本思想.)这在 的文档中有完整的解释gc 模块.收集器可以在您明确要求它时运行,当空闲列表变低时,或者当它很长时间没有运行时;这是动态的,并且在某种程度上是可配置的,因此很难给出何时"的具体答案.

<小时><块引用>

列表是动态数组,这意味着它们需要一块连续的内存.这意味着,如果我尝试将一个对象附加到一个列表中,该列表的底层 C 数据结构数组无法扩展,则该数组将被复制到内存的不同部分,在那里可以使用更大的连续块.那么当我初始化一个列表时,给这个数组分配了多少空间?

这个的代码大部分在listobject.c.情况很复杂;有很多特殊情况,例如 timsort 用于创建临时中间列表和非就地排序的代码.但最终,一些代码决定它需要 N 个指针的空间.

这也不是特别有趣.大多数列表要么从不扩展,要么扩展到远远超出原始大小,因此在开始时进行额外分配会浪费静态列表的内存,并且对大多数增长的列表没有太大帮助.所以,Python 玩得很保守.我相信它首先查看它的内部空闲列表,它不会比 N 指针大太多(它也可能合并相邻的空闲列表存储;我不知道它是否有),所以它可能偶尔会过度分配,但通常它没有.确切的代码应该在 PyList_New 中.

无论如何,如果列表分配器的空闲列表中没有空间,它就会下降到对象分配器,依此类推;它可能最终达到 0 级,但通常不会.

<块引用>

为新数组分配了多少额外空间,该数组现在包含旧列表和附加对象?

这在 list_resize 中处理,这就是有趣的部分.

避免 list.append 成为二次方的唯一方法是在几何上过度分配.过度分配太小的因素(如 1.2)会在前几次扩展中浪费太多时间;对于非常大的数组,使用太大的因子(如 1.6)会浪费太多空间.Python 通过使用从 2.0 开始但很快收敛到 1.25 左右的序列来处理这个问题.根据3.3来源:

<块引用>

增长模式为:0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...

<小时>

您没有专门询问 sorted,但我知道这就是您的原因.

请记住,timsort 主要是一种归并排序,对尚未排序的小子列表进行插入排序.所以,它的大部分操作涉及分配一个大约 2N 大小的新列表和释放两个大约 N 大小的列表.因此,它在复制时几乎可以像就地复制一样具有空间和分配效率.有高达 O(log N) 的浪费,但这通常不是使复制排序变慢的因素.

This is a post inspired from this comment about how memory is allocated for objects in CPython. Originally, this was in the context of creating a list and appending to it in a for loop vis a vis a list comprehension.

So here are my questions:

  1. how many different allocaters are there in CPython?
    • what is the function of each?
  2. when is malloc acutally called? (a list comprehension may not result in a call to malloc, based on what's said in this comment
  3. How much memory does python allocate for itself at startup?

    1. are there rules governing which data structures get first "dibs" on this memory?

  4. What happens to the memory used by an object when it is deleted (does python still hold on to the memory to allocate to another object in the future, or does the GC free up the memory for another process, say Google Chrome, to use)?
  5. When is a GC triggered?
  6. lists are dynamic arrays, which means they need a contiguous piece of memory. This means that if I try to append an object into a list, whose underlying-C-data-structure array cannot be extended, the array is copied over onto a different part of memory, where a larger contiguous block is available. So how much space is allocated to this array when I initialize a list?
    • how much extra space is allocated to the new array, which now holds the old list and the appended object?

EDIT: From the comments, I gather that there are far too many questions here. I only did this because these questions are all pretty related. Still, I'd be happy to split this up into several posts if that is the case (please let me know to do so in the comments)

解决方案

Much of this is answered in the Memory Management chapter of the C API documentation.

Some of the documentation is vaguer than you're asking for. For further details, you'd have to turn to the source code. And nobody's going to be willing to do that unless you pick a specific version. (At least 2.7.5, pre-2.7.6, 3.3.2, pre-3.3.3, and pre-3.4 would be interesting to different people.)

The source to the obmalloc.c file is a good starting place for many of your questions, and the comments at the top have a nice little ASCII-art graph:

    Object-specific allocators
    _____   ______   ______       ________
   [ int ] [ dict ] [ list ] ... [ string ]       Python core         |
+3 | <----- Object-specific memory -----> | <-- Non-object memory --> |
    _______________________________       |                           |
   [   Python`s object allocator   ]      |                           |
+2 | ####### Object memory ####### | <------ Internal buffers ------> |
    ______________________________________________________________    |
   [          Python`s raw memory allocator (PyMem_ API)          ]   |
+1 | <----- Python memory (under PyMem manager`s control) ------> |   |
    __________________________________________________________________
   [    Underlying general-purpose allocator (ex: C library malloc)   ]
 0 | <------ Virtual memory allocated for the python process -------> |

   =========================================================================
    _______________________________________________________________________
   [                OS-specific Virtual Memory Manager (VMM)               ]
-1 | <--- Kernel dynamic storage allocation & management (page-based) ---> |
    __________________________________   __________________________________
   [                                  ] [                                  ]
-2 | <-- Physical memory: ROM/RAM --> | | <-- Secondary storage (swap) --> |


how many different allocaters are there in CPython?

According to the docs, "several". You could count up the ones in the builtin and stdlib types, then add the handful of generic ones, if you really wanted. But I'm not sure what it would tell you. (And it would be pretty version-specific. IIRC, the exact number even changed within the 3.3 tree, as there was an experiment with whether the new-style strings should use three different allocators or one.)


what is the function of each?

The object-specific allocators at level +3 are for specific uses cases that are worth optimizing. As the docs say:

For example, integer objects are managed differently within the heap than strings, tuples or dictionaries because integers imply different storage requirements and speed/space tradeoffs.

Below that, there are various generic supporting allocators at level +2 (and +1.5 and maybe +2.5)—at least an object allocator, an arena allocator, and a small-block allocator, etc.—but all but the first are private implementation details (meaning private even to the C-API; obviously all of it is private to Python code).

And below that, there's the raw allocator, whose function is to ask the OS for more memory when the higher-level allocators need it.


when is malloc acutally called?

The raw memory allocator (or its heap manager) should be the only thing that ever calls malloc. (In fact, it might not even necessarily call malloc; it might use functions like mmap or VirtualAlloc instead. But the point is that it's the only thing that ever asks the OS for memory.) There are a few exceptions within the core of Python, but they'll rarely be relevant.

The docs explicitly say that higher-level code should never try to operate on Python objects in memory obtained from malloc.

However, there are plenty of stdlib and extension modules that use malloc for purposes besides Python objects.

For example, a numpy array of 1000x1000 int32 values doesn't allocate 1 million Python ints, so it doesn't have to go through the int allocator. Instead, it just mallocs an array of 1 million C ints, and wraps them up in Python objects as needed when you access them.


How much memory does python allocate for itself at startup?

This is platform-specific, and a bit hard to figure out from the code. However, when I launch a new python3.3 interpreter on my 64-bit Mac, it starts of with 13.1MB of virtual memory, and almost immediately expands to 201MB. So, that should be a rough ballpark guide.

are there rules governing which data structures get first "dibs" on this memory?

Not really, no. A malicious or buggy object-specific allocator could immediately grab all of the pre-allocated memory and more, and there's nothing to stop it.


What happens to the memory used by an object when it is deleted (does python still hold on to the memory to allocate to another object in the future, or does the GC free up the memory for another process, say Google Chrome, to use)?

It goes back to the object-specific allocator, which may keep it on a freelist, or release it to the raw allocator, which keeps its own freelist. The raw allocator almost never releases memory back to the OS.

This is because there's usually no good reason to release memory back to a modern OS. If you have a ton of unused pages lying around, the OS's VM will just page them out if another process needs it. And when there is a good reason, it's almost always application-specific, and the simplest solution is to use multiple processes to manage your huge short-lived memory requirements.


When is a GC triggered?

It depends on what you mean by "a GC".

CPython uses refcounting; every time you release a reference to an object (by rebinding a variable or a slot in a collection, letting a variable go out of scope, etc.), if it was the last reference, it will be cleaned up immediately. This is explained in the Reference Counting section in the docs.

However, there's a problem with refcounting: if two objects reference each other, even when all outside references go away, they still won't get cleaned up. So, CPython has always had a cycle collector that periodically walks objects looking for cycles of objects that reference each other, but have no outside references. (It's a little more complicated, but that's the basic idea.) This is fully explained in the docs for the gc module. The collector can run when you ask it to explicitly, when the freelists are getting low, or when it hasn't run in a long time; this is dynamic and to some extent configurable, so it's hard to give a specific answer to "when".


lists are dynamic arrays, which means they need a contiguous piece of memory. This means that if I try to append an object into a list, whose underlying-C-data-structure array cannot be extended, the array is copied over onto a different part of memory, where a larger contiguous block is available. So how much space is allocated to this array when I initialize a list?

The code for this is mostly inside listobject.c. It's complicated; there are a bunch of special cases, like the code used by timsort for creating temporary intermediate lists and for non-in-place sorting. But ultimately, some piece of code decides it needs room for N pointers.

It's also not particularly interesting. Most lists are either never expanded, or expanded far beyond the original size, so doing extra allocation at the start wastes memory for static lists and doesn't help much for most growing lists. So, Python plays it conservative. I believe it starts by looking through its internal freelist that's not too much bigger than N pointers (it might also consolidate adjacent freed list storage; I don't know if it does), so it might overallocate a little bit occasionally, but generally it doesn't. The exact code should be in PyList_New.

At any rate, if there's no space in the list allocator's freelist, it drops down to the object allocator, and so on through the levels; it may end up hitting level 0, but usually it doesn't.

how much extra space is allocated to the new array, which now holds the old list and the appended object?

This is handled in list_resize, and this is the interesting part.

The only way to avoid list.append being quadratic is to over allocate geometrically. Overallocating by too small of a factor (like 1.2) wastes way too much time for the first few expansions; using too large of a factor (like 1.6) wastes way too much space for very large arrays. Python handles this by using a sequence that starts off at 2.0 but quickly converges toward somewhere around 1.25. According to the 3.3 source:

The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...


You didn't ask specifically about sorted, but I know that's what prompted you.

Remember that timsort is primarily a merge sort, with an insertion sort for small sublists that aren't already sorted. So, most of its operations involve allocating a new list of about size 2N and freeing two lists of about size N. So, it can be almost as space- and allocation-efficient when copying as it would be in-place. There is up to O(log N) waste, but this usually isn't the factor that makes a copying sort slower.

这篇关于CPython 内存分配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆