numpy数组和列表中元素的大小不同 [英] different size of element in numpy array and list
问题描述
我在Win 7上使用32位Python 3.4.
I am using Python 3.4 32 bits on win 7.
我发现numpy数组中的整数有4个字节,但在列表中却有10个字节.
I found that an integer in an numpy array has 4 bytes, but in a list it has 10 bytes.
import numpy as np
s = 10;
lt = [None] * s;
cnt = 0 ;
for i in range(0, s):
lt[cnt] = i;
cnt += 1;
lt = [x for x in lt if x is not None];
a = np.array(lt);
print("len(a) is " + str(len(a)) + " size is " + str(sys.getsizeof(a)) \
+ " bytes " + " a.itemsize is " + str(a.itemsize) + " total size is " \
+ str(a.itemsize * len(a)) + " Bytes , len(lt) is " \
+ str(len(lt)) + " size is " + str(sys.getsizeof(lt)) + " Bytes ");
len(a) is 10 size is 40 bytes a.itemsize is 4 total size is 40 Bytes , len(lt) is 10 size is 100 Bytes the fist element has 12 Bytes
因为在列表中,每个元素都必须保留一个指向下一个元素的指针?
Because in a list, each element has to keep a pointer to point to the next element ?
如果我为列表分配了一个字符串:
If I assigned a string to the list:
lt[cnt] = "A";
len(a) is 10 size is 40 bytes a.itemsize is 4 total size is 40 Bytes , len(lt) is 10 size is 100 Bytes the fist element has 30 Bytes
因此,在数组中,每个元素有4个字节,在列表中,有30个字节.
So, in array, each element has 4 bytes and in list, it is 30 bytes.
但是,如果我尝试过:
lt[cnt] = "AB";
len(a) is 10 size is 40 bytes a.itemsize is 8 total size is 80 Bytes , len(lt) is 10 size is 100 Bytes the fist element has 33 Bytes
在数组中,每个元素有8个字节,但是在列表中,它是33个字节.
In array, each element has 8 bytes but in list, it is 33 bytes.
如果我尝试过:
lt[cnt] = "csedvserb revrvrrw gvrgrwgervwe grujy oliulfv qdqdqafwg5u u56i78k8 awdwfw"; # 73 characters long
len(a) is 10 size is 40 bytes a.itemsize is 292 total size is 2920 Bytes , len(lt) is 10 size is 100 Bytes the fist element has 246 Bytes
在数组中,每个元素有292个字节(= 73 * 4),但是在列表中,它有246个字节?
In array, each element has 292 bytes (=73 * 4) but in list, it has 246 bytes ?
任何解释将不胜感激.
Any explanation will be appreciated.
推荐答案
数组中的元素大小很简单-由dtype
确定,并且如代码所示,可以通过.itemsize
找到.通常使用4个字节,例如np.int32
,np.float64
. Unicode字符串还为每个字符分配了4个字节-尽管实际的unicode使用可变数量的字符.
The element size in arrays is easy - it's determined by the dtype
, and as your code shows can be found with .itemsize
. 4bytes is common, such as for np.int32
, np.float64
. Unicode strings are also allocated 4 bytes per character - though the real unicode uses a variable number of characters.
列表(和元组)的每个元素大小比较棘手.列表不直接包含元素,而是包含指向存储在其他位置的对象的指针.您的列表大小记录了指针的数量以及一个填充.垫板可以有效地增大尺寸(使用.append
).无论第一项"的大小如何,您所有列表的大小都相同.
The per element size for lists (and tuples) is trickier. A list does not contain the elements directly, rather it contains pointers to objects which are stored elsewhere. Your list size records the number of pointers, plus a pad. The pad lets it grow in size (with .append
) efficiently. All your lists have the same size, regardless of 'first item' size.
我的数据:
In [2324]: lt=[None]*10
In [2325]: sys.getsizeof(lt)
Out[2325]: 72
In [2326]: lt=[i for i in range(10)]
In [2327]: sys.getsizeof(lt)
Out[2327]: 96
In [2328]: lt=['A' for i in range(10)]
In [2329]: sys.getsizeof(lt)
Out[2329]: 96
In [2330]: lt=['AB' for i in range(10)]
In [2331]: sys.getsizeof(lt)
Out[2331]: 96
In [2332]: lt=['ABCDEF' for i in range(10)]
In [2333]: sys.getsizeof(lt)
Out[2333]: 96
In [2334]: lt=[None for i in range(10)]
In [2335]: sys.getsizeof(lt)
Out[2335]: 96
以及对应的数组:
In [2344]: lt=[None]*10; a=np.array(lt)
In [2345]: a
Out[2345]: array([None, None, None, None, None, None, None, None, None, None], dtype=object)
In [2346]: a.itemsize
Out[2346]: 4
In [2347]: lt=['AB' for i in range(10)]; a=np.array(lt)
In [2348]: a
Out[2348]:
array(['AB', 'AB', 'AB', 'AB', 'AB', 'AB', 'AB', 'AB', 'AB', 'AB'],
dtype='<U2')
In [2349]: a.itemsize
Out[2349]: 8
当列表包含None
时,数组为对象dtype,并且元素均为指针(4个字节整数).
When the list contains None
, the array is object dtype, and the elements are all pointers (4 bytes integers).
这篇关于numpy数组和列表中元素的大小不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!