“枢轴"将 Pandas DataFrame 转换为 3D numpy 数组 [英] "Pivot" a Pandas DataFrame into a 3D numpy array
问题描述
给定具有以下结构的 DataFrame:
Given a DataFrame with the following structure:
Date | Site | Measurement Type | Value
-----------------------------------------------
1/1/2020 | A | Temperature | 32.3
1/2/2020 | B | Humidity | 70%
我想创建一个 3D数据透视表",其中第一个轴代表站点,第二个代表日期,第三个代表测量类型,并且值存储在每个元素中.
I would like to create a 3D "pivot table" where the first axis represents site, the second represents date, the third represents measurement type, and values are stored in each element.
例如,如果我在 5 个地点进行为期一周的每日测量,同时测量温度和湿度,则所需的输出将是一个形状为 (5, 7, 2) 的数组.
For example, if I had daily measurements for one week at 5 sites, measuring both Temperature and Humidity, the desired output would be an array with shape (5, 7, 2).
Pandas 似乎只支持创建 2D 数据透视表,但我很满意只有一个未标记的 3D numpy 数组作为输出.在我花时间自己实现之前,想知道是否有一种简单的方法可以做到这一点.
Pandas only seems to support creating 2D pivot tables, but I'm happy with just an unlabeled 3D numpy array as output. Wondering if there's an existing easy way to do this before I spend time implementing it myself.
推荐答案
使用 df.pivot_table
是可行的.我在您的示例中又添加了一行以同时具有 Measurement Type
.在缺失值上,它将由 np.nan
It is doable using df.pivot_table
. I added one more row to your sample to have both Measurement Type
. On missing values, it will be represented by np.nan
sample `df`
Date Site Measurement_Type Value
0 1/1/2020 A Temperature 32.3
1 1/1/2020 A Humidity 60%
2 1/2/2020 B Humidity 70%
尝试以下方法
iix = pd.MultiIndex.from_product([np.unique(df.Date), np.unique(df.Measurement_Type)])
df_pivot = (df.pivot_table('Value', 'Site', ['Date', 'Measurement_Type'], aggfunc='first')
.reindex(iix, axis=1))
arr = np.array(df_pivot.groupby(level=0, axis=1).agg(lambda x: [*x.values])
.to_numpy().tolist())
print(arr)
Out[1447]:
array([[['60%', '32.3'],
[nan, nan]],
[[nan, nan],
['70%', nan]]], dtype=object)
<小时>
方法 2:在不同的列上使用 pivot_table
和 numpy reshape
Method 2: using pivot_table
on different columns and numpy reshape
iix_n = pd.MultiIndex.from_product([np.unique(df.Site), np.unique(df.Date)])
arr = (df.pivot_table('Value', ['Site', 'Date'], 'Measurement_Type', aggfunc='first')
.reindex(iix_n).to_numpy()
.reshape(df.Site.nunique(),df.Date.nunique(),-1))
Out[1501]:
array([[['60%', '32.3'],
[nan, nan]],
[[nan, nan],
['70%', nan]]], dtype=object)
这篇关于“枢轴"将 Pandas DataFrame 转换为 3D numpy 数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!