Python的内存模型 [英] Python Memory Model

查看:1427
本文介绍了Python的内存模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的列表
假如我这样做(是的,我知道code是非常unpythonic,但这个例子的缘故。):

I have a very large list Suppose I do that (yeah, I know the code is very unpythonic, but for the example's sake..):

n = (2**32)**2
for i in xrange(10**7)
  li[i] = n

正常工作。但是:

works fine. however:

for i in xrange(10**7)
  li[i] = i**2

消耗的内存显著用量较大。我不明白为什么会这样 - 存储大数目需要更多的位,并在Java中,第二个选项的确是更多的内存效率......

consumes a significantly larger amount of memory. I don't understand why that is - storing the big number takes more bits, and in Java, the the second option is indeed more memory-efficient...

有没有人有一个解释呢?

Does anyone have an explanation for this?

推荐答案

Java的特殊情况下,一些值类型(包括整数),使它们存储的值(而不是由对象引用其他事物一样)。 Python没有特殊情况下,这种类型,所以分配n至许多条目的列表中的的(或其他普通的Python容器)没有进行复印。

Java special-cases a few value types (including integers) so that they're stored by value (instead of, by object reference like everything else). Python doesn't special-case such types, so that assigning n to many entries in a list (or other normal Python container) doesn't have to make copies.

编辑:注意,引用总是指向的对象的,而不是变量 - 有没有这样的东西在Python(或Java)的引用给一个变量。例如:

note that the references are always to objects, not "to variables" -- there's no such thing as "a reference to a variable" in Python (or Java). For example:

>>> n = 23
>>> a = [n,n]
>>> print id(n), id(a[0]), id(a[1])
8402048 8402048 8402048
>>> n = 45
>>> print id(n), id(a[0]), id(a[1])
8401784 8402048 8402048

我们从第一次印刷看到,在列表中两个条目 A 指完全相同的对象 N 指到 - 但是当 N 被重新分配,现指不同的对象,而在这两个条目仍参考previous之一。

We see from the first print that both entries in list a refer to exactly the same object as n refers to -- but when n is reassigned, it now refers to a different object, while both entries in a still refer to the previous one.

这是 array.array (Python标准库模块阵列)是从列表非常不同:它保持均匀型的紧凑型副本,都需要存储类型的值的副本以每件少位。所有正常的容器保持引用(在C-codeD Python运行时作为指针指向的PyObject结构内部实现:每个指针,在32位版本的,需要4个字节,每个的PyObject至少16左右[包括指向类型,引用计数,实际值和malloc的围捕]),数组不(让他们无法异类,不能有项目,除了从一些基本类型,等等)。

An array.array (from the Python standard library module array) is very different from a list: it keeps compact copies of a homogeneous type, taking as few bits per item as are needed to store copies of values of that type. All normal containers keep references (internally implemented in the C-coded Python runtime as pointers to PyObject structures: each pointer, on a 32-bit build, takes 4 bytes, each PyObject at least 16 or so [including pointer to type, reference count, actual value, and malloc rounding up]), arrays don't (so they can't be heterogeneous, can't have items except from a few basic types, etc).

例如,1000项的容器,所有项目是不同的小整数(那些其值可以在每个2字节适合),将需要大约2000个字节的数据作为一个的 array.array( 'H'),但约20000为列表。但是,如果所有的项目都是相同的号码,该数组将仍然需要2000个字节的数据,名单将只需要20个左右[在这些情况下,您必须添加有关容器对象另有16或32字节的每一位适当的,除了在存储器中的数据]

For example, a 1000-items container, with all items being different small integers (ones whose values can fit in 2 bytes each), would take about 2,000 bytes of data as an array.array('h'), but about 20,000 as a list. But if all items were the same number, the array would still take 2,000 bytes of data, the list would take only 20 or so [[in every one of these cases you have to add about another 16 or 32 bytes for the container-object proper, in addition to the memory for the data]].

不过,虽然问题说数组(即使是在标签),我怀疑它的改编实际上是一个数组 - 如果它是,它不能储存(2 *的 32)的2(在阵列最大INT的值是32位),并且将没有实际观察到的问题报告的存储器的行为。因此,问题可能是在实际上大约一个列表,而不是阵列。

However, although the question says "array" (even in a tag), I doubt its arr is actually an array -- if it were, it could not store (2*32)2 (largest int values in an array are 32 bits) and the memory behavior reported in the question would not actually be observed. So, the question is probably in fact about a list, not an array.

修改:由@ooboo评论问很多合理的后续问题,而不是试图在压碎评论我在这里移动它的详细解释

Edit: a comment by @ooboo asks lots of reasonable followup questions, and rather than trying to squish the detailed explanation in a comment I'm moving it here.

这很奇怪,虽然 - 毕竟,怎么
  该参考整数存储?
  编号(变量)给出的整数,该
  参考是整数本身,是不
  它更便宜地使用整数?

It's weird, though - after all, how is the reference to the integer stored? id(variable) gives an integer, the reference is an integer itself, isn't it cheaper to use the integer?

CPython的商店引用作为指针指向的PyObject(Jython和IronPython的,写在Java和C#,使用这些语言的含蓄地提及; PyPy,用Python编写的,具有非常灵活的后端,并且可以使用许多不同的策略)

CPython stores references as pointers to PyObject (Jython and IronPython, written in Java and C#, use those language's implicit references; PyPy, written in Python, has a very flexible back-end and can use lots of different strategies)

ID(V)给(上CPython的只)的指针(只是作为一种方便的方法来唯一地标识的对象)的数值。列表可以是异构的(有些项目可能是整数,不同类型的其他对象),所以它只是没有存放一些物品作为指针指向的PyObject和其他人不同的(每个对象也需要一个类型指示,在CPython的,一个明智的选择引用计数,至少) - array.array 是同质的,并且限制因此它可以(和不)确实存放物品的价值观,而不是引用的副本(这是往往比较便宜,但不适合在这里同样的项目出现了不少地方,绝大多数项目都是0的集合,如稀疏矩阵)。

id(v) gives (on CPython only) the numeric value of the pointer (just as a handy way to uniquely identify the object). A list can be heterogeneous (some items may be integers, others objects of different types) so it's just not a sensible option to store some items as pointers to PyObject and others differently (each object also needs a type indication and, in CPython, a reference count, at least) -- array.array is homogeneous and limited so it can (and does) indeed store a copy of the items' values rather than references (this is often cheaper, but not for collections where the same item appears a LOT, such as a sparse array where the vast majority of items are 0).

一个Python实现将由语言规范完全允许尝试优化微妙的技巧,只要preserves语义不变,但据我所知,目前没有人确实为这一具体问题(你可以尝试黑客一个PyPy后端,但不检查是否为INT与非-INT的开销压倒所希望的收益)感到惊讶。

A Python implementation would be fully allowed by the language specs to try subtler tricks for optimization, as long as it preserves semantics untouched, but as far as I know none currently does for this specific issue (you could try hacking a PyPy backend, but don't be surprised if the overhead of checking for int vs non-int overwhelms the hoped-for gains).

此外,将有所作为,如果我
  分配2 ** 64到每个插槽代替
  分配N,当n持有的
  参考2 ** 64?会发生什么事
  我只是写1?

Also, would it make a difference if I assigned 2**64 to every slot instead of assigning n, when n holds a reference to 2**64? What happens when I just write 1?

这些都是实现选择,每一个实现完全允许做例子,因为它不是很难preserve语义(所以假设即使,也就是说,3.1和3.2可以表现不同在这方面)。

These are examples of implementation choices that every implementation is fully allowed to make, as it's not hard to preserve the semantics (so hypothetically even, say, 3.1 and 3.2 could behave differently in this regard).

当您使用int文字(或任何其他文字的不可变型),或其他前pression生产这种类型的结果,它是由执行决定是否做出一个新对象无条件地打字,或花一些时间,这些对象之间的检查,看看是否有一个现有的可重复使用。

When you use an int literal (or any other literal of an immutable type), or other expression producing a result of such a type, it's up to the implementation to decide whether to make a new object of that type unconditionally, or spend some time checking among such objects to see if there's an existing one it can reuse.

在实践中,CPython的(我相信其他实现,但我与他们的内部不太熟悉)使用的单一副本足够的的整数(养了predefined的数组c在的PyObject形式几个小整数值,准备使用或重用的需要),但不出去它在一般的方式来寻找其他现有的可重用对象。

In practice, CPython (and I believe the other implementations, but I'm less familiar with their internals) uses a single copy of sufficiently small integers (keeps a predefined C array of a few small integer values in PyObject form, ready to use or reuse at need) but doesn't go out of its way in general to look for other existing reusable objects.

但是,同样的函数中,例如相同的字面常量被轻易地编译为常量的函数表中的单个不变对象引用,所以这是很容易做到的优化,我相信每一个当前Python实现不执行吧。

But for example identical literal constants within the same function are easily and readily compiled as references to a single constant object in the function's table of constants, so that's an optimization that's very easily done, and I believe every current Python implementation does perform it.

它有时是很难比Python的是要记住的语言的,它有几种实现可能(合法和正确地)中有很多这样的细节有所不同 - 每个人,包括像我这样的书呆子,往往说只是蟒,而不是CPython的说起流行的C codeD执行时(在这样之一,绘画语言和实现之间的区别是非常重要的背景节选;-)。但是,区别的的非常重要,值得在一段时间重复一次。

It can sometimes be hard to remember than Python is a language and it has several implementations that may (legitimately and correctly) differ in a lot of such details -- everybody, including pedants like me, tends to say just "Python" rather than "CPython" when talking about the popular C-coded implementation (excepts in contexts like this one where drawing the distinction between language and implementation is paramount;-). Nevertheless, the distinction is quite important, and well worth repeating once in a while.

这篇关于Python的内存模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆