“枢轴"将 Pandas DataFrame 转换为 3D numpy 数组 [英] "Pivot" a Pandas DataFrame into a 3D numpy array

查看:86
本文介绍了“枢轴"将 Pandas DataFrame 转换为 3D numpy 数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定具有以下结构的 DataFrame:

Given a DataFrame with the following structure:

Date     | Site  | Measurement Type | Value
-----------------------------------------------
1/1/2020 | A     | Temperature      | 32.3
1/2/2020 | B     | Humidity         | 70%

我想创建一个 3D数据透视表",其中第一个轴代表站点,第二个代表日期,第三个代表测量类型,并且值存储在每个元素中.

I would like to create a 3D "pivot table" where the first axis represents site, the second represents date, the third represents measurement type, and values are stored in each element.

例如,如果我在 5 个地点进行为期一周的每日测量,同时测量温度和湿度,则所需的输出将是一个形状为 (5, 7, 2) 的数组.

For example, if I had daily measurements for one week at 5 sites, measuring both Temperature and Humidity, the desired output would be an array with shape (5, 7, 2).

Pandas 似乎只支持创建 2D 数据透视表,但我很满意只有一个未标记的 3D numpy 数组作为输出.在我花时间自己实现之前,想知道是否有一种简单的方法可以做到这一点.

Pandas only seems to support creating 2D pivot tables, but I'm happy with just an unlabeled 3D numpy array as output. Wondering if there's an existing easy way to do this before I spend time implementing it myself.

推荐答案

使用 df.pivot_table 是可行的.我在您的示例中又添加了一行以同时具有 Measurement Type.在缺失值上,它将由 np.nan

It is doable using df.pivot_table. I added one more row to your sample to have both Measurement Type. On missing values, it will be represented by np.nan

sample `df`

       Date Site Measurement_Type Value
0  1/1/2020    A      Temperature  32.3
1  1/1/2020    A         Humidity   60%
2  1/2/2020    B         Humidity   70%

尝试以下方法

iix = pd.MultiIndex.from_product([np.unique(df.Date), np.unique(df.Measurement_Type)])
df_pivot = (df.pivot_table('Value', 'Site', ['Date', 'Measurement_Type'], aggfunc='first')
              .reindex(iix, axis=1))
arr = np.array(df_pivot.groupby(level=0, axis=1).agg(lambda x: [*x.values])
                       .to_numpy().tolist())

print(arr)

Out[1447]:
array([[['60%', '32.3'],
        [nan, nan]],

       [[nan, nan],
        ['70%', nan]]], dtype=object)

<小时>

方法 2:在不同的列上使用 pivot_table 和 numpy reshape


Method 2: using pivot_table on different columns and numpy reshape

iix_n = pd.MultiIndex.from_product([np.unique(df.Site), np.unique(df.Date)])
arr = (df.pivot_table('Value', ['Site', 'Date'], 'Measurement_Type', aggfunc='first')
         .reindex(iix_n).to_numpy()
         .reshape(df.Site.nunique(),df.Date.nunique(),-1))

Out[1501]:
array([[['60%', '32.3'],
        [nan, nan]],

       [[nan, nan],
        ['70%', nan]]], dtype=object)

这篇关于“枢轴"将 Pandas DataFrame 转换为 3D numpy 数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆