numpy切片数组而不复制它 [英] numpy slice an array without copying it

查看:128
本文介绍了numpy切片数组而不复制它的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在矩阵x中有大量数据,我需要分析一些子矩阵.

I have a large data in matrix x and I need to analyze some some submatrices.

我正在使用以下代码选择子矩阵:

I am using the following code to select the submatrix:

>>> import numpy as np
>>> x = np.random.normal(0,1,(20,2))
>>> x
array([[-1.03266826,  0.04646684],
       [ 0.05898304,  0.31834926],
       [-0.1916809 , -0.97929025],
       [-0.48837085, -0.62295003],
       [-0.50731017,  0.50305894],
       [ 0.06457385, -0.10670002],
       [-0.72573604,  1.10026385],
       [-0.90893845,  0.99827162],
       [ 0.20714399, -0.56965615],
       [ 0.8041371 ,  0.21910274],
       [-0.65882317,  0.2657183 ],
       [-1.1214074 , -0.39886425],
       [ 0.0784783 , -0.21630006],
       [-0.91802557, -0.20178683],
       [ 0.88268539, -0.66470235],
       [-0.03652459,  1.49798484],
       [ 1.76329838, -0.26554555],
       [-0.97546845, -2.41823586],
       [ 0.32335103, -1.35091711],
       [-0.12981597,  0.27591674]])
>>> index = x[:,1] > 0
>>> index
array([ True,  True, False, False,  True, False,  True,  True, False,
        True,  True, False, False, False, False,  True, False, False,
       False,  True], dtype=bool)
>>> x1 = x[index, :] #x1 is a copy of the submatrix
>>> x1
array([[-1.03266826,  0.04646684],
       [ 0.05898304,  0.31834926],
       [-0.50731017,  0.50305894],
       [-0.72573604,  1.10026385],
       [-0.90893845,  0.99827162],
       [ 0.8041371 ,  0.21910274],
       [-0.65882317,  0.2657183 ],
       [-0.03652459,  1.49798484],
       [-0.12981597,  0.27591674]])
>>> x1[0,0] = 1000
>>> x1
array([[  1.00000000e+03,   4.64668400e-02],
       [  5.89830401e-02,   3.18349259e-01],
       [ -5.07310170e-01,   5.03058935e-01],
       [ -7.25736045e-01,   1.10026385e+00],
       [ -9.08938455e-01,   9.98271624e-01],
       [  8.04137104e-01,   2.19102741e-01],
       [ -6.58823174e-01,   2.65718300e-01],
       [ -3.65245877e-02,   1.49798484e+00],
       [ -1.29815968e-01,   2.75916735e-01]])
>>> x
array([[-1.03266826,  0.04646684],
       [ 0.05898304,  0.31834926],
       [-0.1916809 , -0.97929025],
       [-0.48837085, -0.62295003],
       [-0.50731017,  0.50305894],
       [ 0.06457385, -0.10670002],
       [-0.72573604,  1.10026385],
       [-0.90893845,  0.99827162],
       [ 0.20714399, -0.56965615],
       [ 0.8041371 ,  0.21910274],
       [-0.65882317,  0.2657183 ],
       [-1.1214074 , -0.39886425],
       [ 0.0784783 , -0.21630006],
       [-0.91802557, -0.20178683],
       [ 0.88268539, -0.66470235],
       [-0.03652459,  1.49798484],
       [ 1.76329838, -0.26554555],
       [-0.97546845, -2.41823586],
       [ 0.32335103, -1.35091711],
       [-0.12981597,  0.27591674]])
>>> 

但是我希望x1只是一个指针或类似的东西.每当需要子矩阵时,复制数据对我来说都是太昂贵了. 我该怎么办?

but I would like x1 to be only a pointer or something like this. Copy the data every time that I need a submatrix is too expensive for me. How can I do that?

显然,numpy数组没有任何解决方案.从这个角度来看,熊猫的数据框架更好吗?

Apparently there is not any solution with the numpy array. Are the pandas data frame better from this point of view?

推荐答案

数组x的信息总结在.__array_interface__属性中

The information for your array x is summarized in the .__array_interface__ property

In [433]: x.__array_interface__
Out[433]: 
{'descr': [('', '<f8')],
 'strides': None,
 'data': (171396104, False),
 'typestr': '<f8',
 'version': 3,
 'shape': (20, 2)}

它具有数组shapestrides(此处为默认值)和指向数据缓冲区的指针. view可以指向相同的数据缓冲区(可能更远),并且具有自己的shapestrides.

It has the array shape, strides (default here), and pointer to the data buffer. A view can point to the same data buffer (possibly further along), and have its own shape and strides.

但是用布尔值建立索引不能用这几个数字来总结.它要么必须一直携带index数组,要么必须复制x数据缓冲区中的选定项. numpy选择要复制.您可以选择何时应用index,现在还是在调用堆栈中更进一步.

But indexing with your boolean can't be summarized in those few numbers. Either it has to carry the index array all the way through, or copy selected items from the x data buffer. numpy chooses to copy. You have choice of when to apply the index, now or further down the calling stack.

这篇关于numpy切片数组而不复制它的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆