在NumPy中按列增长矩阵 [英] Growing matrices columnwise in NumPy

查看:57
本文介绍了在NumPy中按列增长矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在纯Python中,您可以轻松地逐列增长矩阵:

In pure Python you can grow matrices column by column pretty easily:

data = []
for i in something:
    newColumn = getColumnDataAsList(i)
    data.append(newColumn)

NumPy 的数组没有append函数. hstack函数不适用于零大小的数组,因此以下内容将不起作用:

NumPy's array doesn't have the append function. The hstack function doesn't work on zero sized arrays, thus the following won't work:

data = numpy.array([])
for i in something:
    newColumn = getColumnDataAsNumpyArray(i)
    data = numpy.hstack((data, newColumn)) # ValueError: arrays must have same number of dimensions

因此,我的选择是在适当的条件下删除循环的初始化部分:

So, my options are either to remove the initalization iside the loop with appropriate condition:

data = None
for i in something:
    newColumn = getColumnDataAsNumpyArray(i)
    if data is None:
        data = newColumn
    else:
        data = numpy.hstack((data, newColumn)) # works

...或使用Python列表进行转换,稍后再转换为数组:

... or to use a Python list and convert is later to array:

data = []
for i in something:
    newColumn = getColumnDataAsNumpyArray(i)
    data.append(newColumn)
data = numpy.array(data)

两个变体似乎有点尴尬.有更好的解决方案吗?

Both variants seem a little bit awkward to be. Are there nicer solutions?

推荐答案

NumPy实际上确实具有 append 函数,该函数似乎可以满足您的要求,例如

NumPy actually does have an append function, which it seems might do what you want, e.g.,

import numpy as NP
my_data = NP.random.random_integers(0, 9, 9).reshape(3, 3)
new_col = NP.array((5, 5, 5)).reshape(3, 1)
res = NP.append(my_data, new_col, axis=1)

如果您添加另一行,例如,

your second snippet (hstack) will work if you add another line, e.g.,

my_data = NP.random.random_integers(0, 9, 16).reshape(4, 4)
# the line to add--does not depend on array dimensions
new_col = NP.zeros_like(my_data[:,-1]).reshape(-1, 1)
res = NP.hstack((my_data, new_col))

hstack给出的结果与concatenate((my_data, new_col), axis=1)相同,我不确定它们如何比较性能.

hstack gives the same result as concatenate((my_data, new_col), axis=1), i'm not sure how they compare performance-wise.

虽然这是对您问题的最直接答案,但我应该指出,通过数据源循环通过 append 填充目标,虽然在python中很好,但这并不是惯用的NumPy.原因如下:

While that's the most direct answer to your question, i should mention that looping through a data source to populate a target via append, while just fine in python, is not idiomatic NumPy. Here's why:

初始化NumPy数组比较昂贵,并且使用这种传统的python模式,您在每次循环迭代时或多或少会产生该费用(即,每个附加到NumPy数组的对象大致类似于初始化具有不同大小的新数组.

initializing a NumPy array is relatively expensive, and with this conventional python pattern, you incur that cost, more or less, at each loop iteration (i.e., each append to a NumPy array is roughly like initializing a new array with a different size).

出于这个原因,NumPy中用于向2D数组迭代添加列的常见模式是一次初始化一个空的目标数组 (或预先分配一个具有所有空列),通过设置所需的列方向偏移量(索引)来依次填充那些空列-比解释起来容易得多:

For that reason, the common pattern in NumPy for iterative addition of columns to a 2D array is to initialize an empty target array once(or pre-allocate a single 2D NumPy array having all of the empty columns) the successively populate those empty columns by setting the desired column-wise offset (index)--much easier to show than to explain:

>>> # initialize your skeleton array using 'empty' for lowest-memory footprint 
>>> M = NP.empty(shape=(10, 5), dtype=float)

>>> # create a small function to mimic step-wise populating this empty 2D array:
>>> fnx = lambda v : NP.random.randint(0, 10, v)

像在OP中一样填充NumPy数组,除了每次迭代只是在连续的列偏移处重置M的值

populate NumPy array as in the OP, except each iteration just re-sets the values of M at successive column-wise offsets

>>> for index, itm in enumerate(range(5)):    
        M[:,index] = fnx(10)

>>> M
  array([[ 1.,  7.,  0.,  8.,  7.],
         [ 9.,  0.,  6.,  9.,  4.],
         [ 2.,  3.,  6.,  3.,  4.],
         [ 3.,  4.,  1.,  0.,  5.],
         [ 2.,  3.,  5.,  3.,  0.],
         [ 4.,  6.,  5.,  6.,  2.],
         [ 0.,  6.,  1.,  6.,  8.],
         [ 3.,  8.,  0.,  8.,  0.],
         [ 5.,  2.,  5.,  0.,  1.],
         [ 0.,  6.,  5.,  9.,  1.]])

当然,如果您事先不知道数组的大小 只是创建一个比您需要的大得多的内容并修剪未使用"的部分 完成填充后

of course if you don't known in advance what size your array should be just create one much bigger than you need and trim the 'unused' portions when you finish populating it

>>> M[:3,:3]
  array([[ 9.,  3.,  1.],
         [ 9.,  6.,  8.],
         [ 9.,  7.,  5.]])

这篇关于在NumPy中按列增长矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆