将numpy数组的pandas列转换为高维的numpy数组 [英] Convert pandas column of numpy arrays to numpy array of higher dimension

查看:326
本文介绍了将numpy数组的pandas列转换为高维的numpy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个形状为(75,9)的熊猫数据框.

I have a pandas dataframe of shape (75,9).

这些列中只有一列是numpy数组,每个数组的形状都是(100, 4, 3)

Only one of those columns is of numpy arrays, each of which is of shape (100, 4, 3)

我有一个奇怪的现象:

data = self.df[self.column_name].values[0]

形状为(100,4,3),但是

data = self.df[self.column_name].values

的形状为(75,),其中minmax不是数字对象"

is of shape (75,), with min and max are 'not a numeric object'

我希望data = self.df[self.column_name].values的形状(75、100、4、3),并带有一些minmax.

I expected data = self.df[self.column_name].values to be of shape (75, 100, 4, 3), with some min and max.

如何使一列numpy数组的行为类似于更高维度的numpy数组(长度=数据框中的行数)?

How can I make a column of numpy arrays behave like a numpy array of a higher dimension (with length=number of rows in the dataframe)?

复制:

    some_df = pd.DataFrame(columns=['A'])
    for i in range(10):
        some_df.loc[i] = [np.random.rand(4, 6)]
    print some_df['A'].values.shape
    print some_df['A'].values[0].shape

打印(10L,)(4L,6L)而不是所需的(10L, 4L, 6L)(4L,6L)

prints (10L,),(4L,6L) instead of desired (10L, 4L, 6L),(4L,6L)

推荐答案

In [42]: some_df = pd.DataFrame(columns=['A']) 
    ...: for i in range(4): 
    ...:         some_df.loc[i] = [np.random.randint(0,10,(1,3))] 
    ...:                                                                                  
In [43]: some_df                                                                          
Out[43]: 
             A
0  [[7, 0, 9]]
1  [[3, 6, 8]]
2  [[9, 7, 6]]
3  [[1, 6, 3]]

该列的numpy值是一个对象dtype数组,其中包含数组:

The numpy values of the column are an object dtype array, containing arrays:

In [44]: some_df['A'].to_numpy()                                                          
Out[44]: 
array([array([[7, 0, 9]]), array([[3, 6, 8]]), array([[9, 7, 6]]),
       array([[1, 6, 3]])], dtype=object)

如果这些数组都具有相同的形状,则stack可以很好地将它们连接到新的维度:

If those arrays all have the same shape, stack does a nice job of concatenating them on a new dimension:

In [45]: np.stack(some_df['A'].to_numpy())                                                
Out[45]: 
array([[[7, 0, 9]],

       [[3, 6, 8]],

       [[9, 7, 6]],

       [[1, 6, 3]]])
In [46]: _.shape                                                                          
Out[46]: (4, 1, 3)

这仅适用于一列.与所有concatenate一样,stack将输入参数视为可迭代的有效数组列表.

This only works with one column. stack like all concatenate treats the input argument as an iterable, effectively a list of arrays.

In [48]: some_df['A'].to_list()                                                           
Out[48]: 
[array([[7, 0, 9]]),
 array([[3, 6, 8]]),
 array([[9, 7, 6]]),
 array([[1, 6, 3]])]
In [50]: np.stack(some_df['A'].to_list()).shape                                           
Out[50]: (4, 1, 3)

这篇关于将numpy数组的pandas列转换为高维的numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆