Pandas数据框取消堆栈数据并创建新列 [英] Pandas dataframe unstack data and create new columns
问题描述
我有两组堆叠的数据,如下所示:
I have two sets of stacked data as follows:
set n value_1 value_2
0 1 1024 25942.6 25807.8 ----> first set starts here
1 1 2048 72000.5 71507.9
2 1 4096 161095.0 160303.0
3 1 8192 356419.0 354928.0
4 1 16384 793562.0 788666.0
5 1 32768 1914250.0 1889850.0
6 1 65536 3490860.0 3479040.0
7 1 131072 8096130.0 8036290.0
8 1 262144 16616500.0 16525400.0
11 2 1024 35116.3 35032.5 ----> second set starts here
12 2 2048 98783.8 98507.0
13 2 4096 230813.0 230206.0
14 2 8192 521754.0 518052.0
15 2 16384 1046870.0 1040990.0
16 2 32768 2118340.0 2112680.0
17 2 65536 4693000.0 4673130.0
18 2 131072 9960240.0 9892870.0
19 2 262144 21230600.0 21068700.0
如何解叠它们,以便获得两个新列 value_1_2
和 value_2_2
,它们对应于第二组数据并与基于的第一组数据匹配 n
的值?
How can I unstack them so that I get two new columns value_1_2
, and value_2_2
, which correspond to the second set of data and match to the first one based on the value of n
?
这就是我想要的:
n value_1 value_2 value_1_2 value_2_2
1024 25942.6 25807.8 35116.3 35032.5
2048 72000.5 71507.9 98783.8 98507
4096 161095 160303 230813 230206
8192 356419 354928 521754 518052
16384 793562 788666 1046870 1040990
32768 1914250 1889850 2118340 2112680
65536 3490860 3479040 4693000 4673130
131072 8096130 8036290 9960240 9892870
262144 16616500 16525400 21230600 21068700
推荐答案
First we create a list of dfs
by using DataFrame.groupby
to group the dataframe on the column Set
, then for each group in the dataframe we use DataFrame.add_suffix
to add the group identifier to each of the columns:
最后,我们使用 functools.reduce
通过应用
Finally, we use functools.reduce
to reduce the list of dataframes dfs
to the single unstacked dataframe by applying pd.merge
operation on the consecutive dataframe on the column n
.
from functools import reduce
dfs = [
g.drop('set', 1).add_suffix(f'_{k}').rename({f'n_{k}': 'n'}, axis=1)
for k, g in df.groupby('set')
]
df1 = reduce(lambda x, y: pd.merge(x, y, on='n'), dfs)
结果:
# print(df1)
n value_1_1 value_2_1 value_1_2 value_2_2
0 1024 25942.6 25807.8 35116.3 35032.5
1 2048 72000.5 71507.9 98783.8 98507.0
2 4096 161095.0 160303.0 230813.0 230206.0
3 8192 356419.0 354928.0 521754.0 518052.0
4 16384 793562.0 788666.0 1046870.0 1040990.0
5 32768 1914250.0 1889850.0 2118340.0 2112680.0
6 65536 3490860.0 3479040.0 4693000.0 4673130.0
7 131072 8096130.0 8036290.0 9960240.0 9892870.0
8 262144 16616500.0 16525400.0 21230600.0 21068700.0
这篇关于Pandas数据框取消堆栈数据并创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!