如何在CPython中实现元组? [英] How is tuple implemented in CPython?

查看:187
本文介绍了如何在CPython中实现元组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试学习如何在场景下实现CPython。很高兴Python是高级的,但我不喜欢像黑盒子那样对待它。



考虑到这一点,元组如何实现?我看过源(tupleobject.c) ,但是它正在我的头上。



我看到 PyTuple_MAXSAVESIZE = 20 PyTuple_MAXFREELIST = 2000 ,什么是储蓄和免费清单? (长度为20/21或2000/2001的元组之间是否存在性能差异?什么实现最大元组长度?)

解决方案

p>作为一个警告,这个答案中的一切都是基于我从查看你所链接的实现中收集的内容。



看来,元组的标准实现只是一个数组。但是,有一些优化可以加快速度。



首先,如果您尝试创建一个空的元组,CPython将替换代表空的元组。因此,它可以节省正在分配单个对象的一堆分配。



接下来,为了避免分配一堆小对象,CPython会循环使用内存许多小名单。有一个固定常数( PyTuple_MAXSAVESIZE ),所有小于此长度的元组都有资格回收空间。每当一个长度小于这个常数的对象被释放时,有可能与之相关联的内存不会被释放,而是将基于它的大小存储在自由列表中(更多地在下一段中) 。这样,如果你需要分配一个大小为n的元组,然后分配一个元组,并且不再使用,CPython就可以回收旧的数组。



自由列表本身被实现为一个大小为 PyTuple_MAXSAVESIZE 的数组,它将指针存储到未使用的元组,其中数组的第n个元素指向NULL(如果没有额外的大小为n的元组可用)或大小为n的回收元组。如果存在可以重用的大小为n的多个不同的元组,则通过将每个元组的第零个入口点指向可以重复使用的下一个元组,将它们链接在一起的链表中。 (由于只有一个长度为零的元组,所以没有读取不存在的第零个元素的风险)。以这种方式,分配器可以存储一些数量的每个大小的元组以供重用。为了确保这不会使用太多的内存,有一个第二个常量 PyTuple_MAXFREELIST )来控制任何一个桶中的这些链接列表的最大长度。然后存在长度为 PyTuple_MAXSAVESIZE 的辅助数组,其存储每个给定长度的元组的链接列表的长度,以便不超过该上限。



总而言之,这是一个非常聪明的实现!



希望这有帮助!


I've been trying to learn how CPython is implemented under the scenes. It's great that Python is high level, but I don't like treating it like a black box.

With that in mind, how are tuples implemented? I've had a look at the source (tupleobject.c), but it's going over my head.

I see that PyTuple_MAXSAVESIZE = 20 and PyTuple_MAXFREELIST = 2000, what is saving and the "free list"? (Will there be a performance difference between tuples of length 20/21 or 2000/2001? What enforces the maximum tuple length?)

解决方案

As a caveat, everything in this answer is based on what I've gleaned from looking over the implementation you linked.

It seems that the standard implementation of a tuple is simply as an array. However, there are a bunch of optimizations in place to speed things up.

First, if you try to make an empty tuple, CPython instead will hand back a canonical object representing the empty tuple. As a result, it can save on a bunch of allocations that are just allocating a single object.

Next, to avoid allocating a bunch of small objects, CPython recycles memory for many small lists. There is a fixed constant (PyTuple_MAXSAVESIZE) such that all tuples less than this length are eligible to have their space reclaimed. Whenever an object of length less than this constant is deallocated, there is a chance that the memory associated with it will not be freed and instead will be stored in a "free list" (more on that in the next paragraph) based on its size. That way, if you ever need to allocate a tuple of size n and one has previously been allocated and is no longer in use, CPython can just recycle the old array.

The free list itself is implemented as an array of size PyTuple_MAXSAVESIZE storing pointers to unused tuples, where the nth element of the array points either to NULL (if no extra tuples of size n are available) or to a reclaimed tuple of size n. If there are multiple different tuples of size n that could be reused, they are chained together in a sort of linked list by having each tuple's zeroth entry point to the next tuple that can be reused. (Since there is only one tuple of length zero ever allocated, there is never a risk of reading a nonexistent zeroth element). In this way, the allocator can store some number of tuples of each size for reuse. To ensure that this doesn't use too much memory, there is a second constant PyTuple_MAXFREELIST) that controls the maximum length of any of these linked lists within any bucket. There is then a secondary array of length PyTuple_MAXSAVESIZE that stores the length of the linked lists for tuples of each given length so that this upper limit isn't exceeded.

All in all, it's a very clever implementation!

Hope this helps!

这篇关于如何在CPython中实现元组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆