什么是相对于对象的数组numpy的与问候内存列出清单的好处/缺点? [英] What are the benefits / drawbacks of a list of lists compared to a numpy array of OBJECTS with regards to MEMORY?

查看:161
本文介绍了什么是相对于对象的数组numpy的与问候内存列出清单的好处/缺点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图理解记忆和其它开销的影响,使用 numpy的列表将对 DTYPE 对象相比列表的列表。

I'm trying to understand the memory and other overhead implications that using numpy lists would have for arrays of dtype object compared to lists of lists.

这是否更改与维?如二维和三维VS N-D。

Does this change with dimensionality? eg 2D vs 3D vs N-D.

一些好处,我可以想到用 numpy的阵列时,是物像的 .shape .T 并可以施展他们与矩阵 np.matrix 更快。

Some of the the benefits I can think of when using numpy arrays are that things like .shape, .T and that you can cast them as matrices with np.matrix much faster.

还有什么?

此外,如果有人有兴趣,我使用的对象是:

Also, if anyone is interested the object I'm using is:

import gmpy2 as gm
gm.mpfr( '0' )                                           # <-- this is the object

编辑:

只是为了澄清我很感兴趣,其中 numpy的数组类型是对象不是本地人numpy的情况下型。

Just to clarify I'm interested in the case where the numpy array type is object not a native numpy-type.

编辑2:

相关跟进问候速度。

<一个href=\"http://stackoverflow.com/questions/26768013/what-are-the-benefits-drawbacks-of-a-list-of-lists-compared-to-a-numpy-array-o\">What好处/比较对象的数组numpy的与问候SPEED名单列表的弊端?

推荐答案

我要回答你的首要问题,而假他人(转置的性能等)进行。所以:

I'm going to answer your primary question, and leave the others (performance of transpose, etc.) out. So:

我试图理解记忆和其它开销的影响,使用numpy的名单将有...只是为了澄清我感兴趣的地方numpy的数组类型的话对象不是浮动双击 INT

I'm trying to understand the memory and other overhead implications that using numpy lists would have … Just to clarify I'm interested in the case where the numpy array type is object not a float, double or int

一个Python列表是指向Python对象包装你存储任何实际值它加一些额外的松弛,使其能够在飞行中有效地扩大数组。让我们称之为懈怠的20%,只是为了便于计算的缘故。例如,10000 32位整数的列表占用了,说,96000字节数组,加上Python的整数对象周围24万字节,再加上一个小的开销清单本身,又称80个字节。

A Python list is an array of pointers to Python objects wrapping whatever actual values you've stored in it—plus some extra slack to allow it to be expanded on the fly efficiently. Let's call that slack 20%, just for the sake of easy computation. For example, an list of 10000 32-bit integers takes up, say, 96000 bytes for the array, plus around 240000 bytes for the Python integer objects, plus a small overhead for the list itself, say 80 bytes again.

一个numpy的阵列的任何实际值你存储在一个数组。例如,10000 32位整数的数组占用40000字节,加上为数组本身通过少的开销,假设80个字节。但是,当你使用DTYPE 对象,每个实际值只是一个指向一个Python对象,就像一个列表

A NumPy array is an array of whatever actual values you've stored in it. For example, an array of 10000 32-bit integers takes up 40000 bytes, plus a small overhead for the array itself, say 80 bytes. But when you're using dtype object, each "actual value" is just a pointer to a Python object, just as with a list.

所以,这里的唯一真正的区别是松弛:阵列将要使用320080字节,而该列表将使用336080字节。不是一个巨大的差异,但它能够决定的事情。

So, the only real difference here is the slack: The array is going to use 320080 bytes, while the list is going to use 336080 bytes. Not a huge difference, but it can matter.

此外,没有之一2D变得比其他的快VS ND或沿给定尺寸的大小。

Also, does one become faster than the other in 2D vs ND or with the size along a given dimension.

是的,嵌套的名单由一个巨大的数额增加快...但没有。

Yes, the nested list will increase faster… but not by a huge amount.

在numpy的多维数组存储为一个巨大的阵列(用C或Fortran语言大步顺序),因此无论形状(10000,)(100,100)(10,10,10,10),它的大小相同。 (开销可能会得到几个字节更大的存储有关跨步的更多信息,但如果我们在谈论,说,256字节与80出320K,谁在乎呢?)

A multidimensional array in numpy is stored as one giant array (in either C or Fortran striding order), so whether the shape is (10000,), (100, 100), or (10, 10, 10, 10), it's the same size. (The overhead may get a few bytes bigger to store more information about the striding, but if we're talking about, say, 256 bytes vs. 80 out of 320K, who cares?)

有一个嵌套列表,而另一方面,有更多的名单,有松弛和开销在各个层面。例如,的整数10列出10列出10列出清单有1 + 10 + 100 + 1000阵列的12球,和1 + 10 + 100 + 1000列表头。

A nested list, on the other hand, has more lists, with slack and overhead at every level. For example, a list of 10 lists of 10 lists of 10 lists of integers has 1+10+100+1000 arrays of 12 pointers, and 1+10+100+1000 list headers.

因此​​,阵列仍然使用320080字节,或也许320256,但该列表是使用435536

So, the array is still using 320080 bytes, or maybe 320256, but the list is using 435536.

如果您想了解更多关于如何列表实施......嗯,这取决于你使用的实施。但是在CPython的,在 C API pretty很多保证,它会被存储的PyObject * 的连续排列,并且追加是分期常量时间pretty很大的实际情况要求它离开松弛的比例增长。而且你可以在头并的来源的,这正是它做什么。 (另外,请记住,您从源获取特定的大小将通常取决于你编译它的平台上。最重要的,因为有指针所有的地方,64位平台往往50-100之间有地方%的速度对象超过32位平台的大多数对象更多的开销。)

If you want to know more about how list is implemented… well, that depends on the implementation you're using. But in CPython, the C API pretty much guarantees that it's going to be storing a contiguous array of PyObject *, and the fact that appending is amortized constant time pretty much requires that it leave slack that grows proportionately. And you can see in the headers and the source that this is exactly what it does. (Also, keep in mind that the specific sizes you get from that source will generally depend on the platform you compile it on. Most importantly, because there are pointers all over the place, 64-bit platforms tend to have somewhere between 50-100% more overhead per object for most objects than 32-bit platforms.)

这篇关于什么是相对于对象的数组numpy的与问候内存列出清单的好处/缺点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆