将 pandas 系列AND数据框对象转换为numpy数组 [英] convert pandas series AND dataframe objects to a numpy array

查看:73
本文介绍了将 pandas 系列AND数据框对象转换为numpy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个pandas系列对象,如下所示:

I have a pandas series object that looks like the following:

s1 = pd.Series([0,1,2,3,4,5,6,7,8], index=['AB', 'AC','AD', 'BA','BB','BC','CA','CB','CC'])

我想将此系列转换为numpy数组,如下所示:

I want to convert this series to a numpy array as follows:

series_size = s1.size
dimension_len = np.sqrt(series_size) 
**Note: series_size will always have an integer sqrt

dimension_len将确定所需二维数组中每个维的大小.

The dimension_len will determine the size of each of the dimensions in the desired 2 dimensional array.

在上述系列对象中,Dimension_len = 3,因此所需的numpy数组将是3 x 3的数组,如下所示:

In the above series object, the dimension_len = 3 so the desired numpy array will be a 3 x 3 array as follows:

np.array([[0, 1, 2], 
[3, 4, 5],
[6,7, 8]])

数据帧到Numpy数组:

我有一个pandas数据框对象,如下所示:

Dataframe to Numpy Array:

I have a pandas dataframe object that looks like the following:

s1 = pd.Series([0,1,2,3,4,5,6,7,8], index=['AA', 'AB','AC', 'BA','BB','BC','CA','CB','CC'])
s2 = pd.Series([-2,2], index=['AB','BA'])
s3 = pd.Series([4,3,-3,-4], index=['AC','BC', 'CB','CA'])

df = pd.concat([s1, s2, s3], axis=1)

max_size = max(s1.size, s2.size, s3.size)

dimension_len = np.sqrt(max_size)
num_columns = len(df.columns)
**Note: max_size will always have an integer sqrt

结果numpy数组将由以下信息确定:

The resulting numpy array will be determined by the following information:

num_columns =确定数组的维数 Dimensions_len =确定每个尺寸的大小

num_columns = determines number of dimensions of the array dimension_len = determines the size of each dimension

在上面的示例中,所需的numpy数组将为3 x 3 x 3(num_columns = 3并且Dimension_len = 3)

In the above example the desired numpy array will be 3 x 3 x 3 (num_columns = 3 and dimension_len = 3)

同样,df的第一列将变为DESIRED_ARRAY [0],df的第二列将变为DESIRED_ARRAY [1],df的第三列将变为DESIRED_ARRAY [2],依此类推...

As well the first column of df will become DESIRED_ARRAY[0], the second column of df will become DESIRED_ARRAY[1], the third column of df will become DESIRED_ARRAY[2] and so on...

我想要的所需数组如下:

The desired array I want looks like:

np.array([[[0, 1, 2], 
[3, 4, 5],
[6, 7, 8]],

[[np.nan,-2, np.nan],
[2, np.nan, np.nan],
[np.nan, np.nan, np.nan]],

[[np.nan,np.nan, 4],
[np.nan, np.nan, 3],
[-4, -3, np.nan]],
])

推荐答案

IIUC,您可以尝试numpy转置和reshape

IIUC, you may try numpy transpose and reshape

df.values.T.reshape(-1,  int(dimension_len), int(dimension_len))

Out[30]:
array([[[ 0.,  1.,  2.],
        [ 3.,  4.,  5.],
        [ 6.,  7.,  8.]],

       [[nan, -2., nan],
        [ 2., nan, nan],
        [nan, nan, nan]],

       [[nan, nan,  4.],
        [nan, nan,  3.],
        [-4., -3., nan]]])

这篇关于将 pandas 系列AND数据框对象转换为numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆