numpy切片数组而不复制它 [英] numpy slice an array without copying it
问题描述
我在矩阵x
中有大量数据,我需要分析一些子矩阵.
I have a large data in matrix x
and I need to analyze some some submatrices.
我正在使用以下代码选择子矩阵:
I am using the following code to select the submatrix:
>>> import numpy as np
>>> x = np.random.normal(0,1,(20,2))
>>> x
array([[-1.03266826, 0.04646684],
[ 0.05898304, 0.31834926],
[-0.1916809 , -0.97929025],
[-0.48837085, -0.62295003],
[-0.50731017, 0.50305894],
[ 0.06457385, -0.10670002],
[-0.72573604, 1.10026385],
[-0.90893845, 0.99827162],
[ 0.20714399, -0.56965615],
[ 0.8041371 , 0.21910274],
[-0.65882317, 0.2657183 ],
[-1.1214074 , -0.39886425],
[ 0.0784783 , -0.21630006],
[-0.91802557, -0.20178683],
[ 0.88268539, -0.66470235],
[-0.03652459, 1.49798484],
[ 1.76329838, -0.26554555],
[-0.97546845, -2.41823586],
[ 0.32335103, -1.35091711],
[-0.12981597, 0.27591674]])
>>> index = x[:,1] > 0
>>> index
array([ True, True, False, False, True, False, True, True, False,
True, True, False, False, False, False, True, False, False,
False, True], dtype=bool)
>>> x1 = x[index, :] #x1 is a copy of the submatrix
>>> x1
array([[-1.03266826, 0.04646684],
[ 0.05898304, 0.31834926],
[-0.50731017, 0.50305894],
[-0.72573604, 1.10026385],
[-0.90893845, 0.99827162],
[ 0.8041371 , 0.21910274],
[-0.65882317, 0.2657183 ],
[-0.03652459, 1.49798484],
[-0.12981597, 0.27591674]])
>>> x1[0,0] = 1000
>>> x1
array([[ 1.00000000e+03, 4.64668400e-02],
[ 5.89830401e-02, 3.18349259e-01],
[ -5.07310170e-01, 5.03058935e-01],
[ -7.25736045e-01, 1.10026385e+00],
[ -9.08938455e-01, 9.98271624e-01],
[ 8.04137104e-01, 2.19102741e-01],
[ -6.58823174e-01, 2.65718300e-01],
[ -3.65245877e-02, 1.49798484e+00],
[ -1.29815968e-01, 2.75916735e-01]])
>>> x
array([[-1.03266826, 0.04646684],
[ 0.05898304, 0.31834926],
[-0.1916809 , -0.97929025],
[-0.48837085, -0.62295003],
[-0.50731017, 0.50305894],
[ 0.06457385, -0.10670002],
[-0.72573604, 1.10026385],
[-0.90893845, 0.99827162],
[ 0.20714399, -0.56965615],
[ 0.8041371 , 0.21910274],
[-0.65882317, 0.2657183 ],
[-1.1214074 , -0.39886425],
[ 0.0784783 , -0.21630006],
[-0.91802557, -0.20178683],
[ 0.88268539, -0.66470235],
[-0.03652459, 1.49798484],
[ 1.76329838, -0.26554555],
[-0.97546845, -2.41823586],
[ 0.32335103, -1.35091711],
[-0.12981597, 0.27591674]])
>>>
但是我希望x1只是一个指针或类似的东西.每当需要子矩阵时,复制数据对我来说都是太昂贵了. 我该怎么办?
but I would like x1 to be only a pointer or something like this. Copy the data every time that I need a submatrix is too expensive for me. How can I do that?
显然,numpy数组没有任何解决方案.从这个角度来看,熊猫的数据框架更好吗?
Apparently there is not any solution with the numpy array. Are the pandas data frame better from this point of view?
推荐答案
数组x
的信息总结在.__array_interface__
属性中
The information for your array x
is summarized in the .__array_interface__
property
In [433]: x.__array_interface__
Out[433]:
{'descr': [('', '<f8')],
'strides': None,
'data': (171396104, False),
'typestr': '<f8',
'version': 3,
'shape': (20, 2)}
它具有数组shape
,strides
(此处为默认值)和指向数据缓冲区的指针. view
可以指向相同的数据缓冲区(可能更远),并且具有自己的shape
和strides
.
It has the array shape
, strides
(default here), and pointer to the data buffer. A view
can point to the same data buffer (possibly further along), and have its own shape
and strides
.
但是用布尔值建立索引不能用这几个数字来总结.它要么必须一直携带index
数组,要么必须复制x
数据缓冲区中的选定项. numpy
选择要复制.您可以选择何时应用index
,现在还是在调用堆栈中更进一步.
But indexing with your boolean can't be summarized in those few numbers. Either it has to carry the index
array all the way through, or copy selected items from the x
data buffer. numpy
chooses to copy. You have choice of when to apply the index
, now or further down the calling stack.
这篇关于numpy切片数组而不复制它的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!