取消融化 pandas 数据框 [英] Unmelt Pandas DataFrame
问题描述
我有一个带有两个id变量的pandas数据框:
I have a pandas dataframe with two id variables:
df = pd.DataFrame({'id': [1,1,1,2,2,3],
'num': [10,10,12,13,14,15],
'q': ['a', 'b', 'd', 'a', 'b', 'z'],
'v': [2,4,6,8,10,12]})
id num q v
0 1 10 a 2
1 1 10 b 4
2 1 12 d 6
3 2 13 a 8
4 2 14 b 10
5 3 15 z 12
我可以使用以下方法来旋转桌子
I can pivot the table with:
df.pivot('id','q','v')
最后得到一些接近的东西:
And end up with something close:
q a b d z
id
1 2 4 6 NaN
2 8 10 NaN NaN
3 NaN NaN NaN 12
但是,我真正想要的是(原始未融化形式):
However, what I really want is (the original unmelted form):
id num a b d z
1 10 2 4 NaN NaN
1 12 NaN NaN 6 NaN
2 13 8 NaN NaN NaN
2 14 NaN 10 NaN NaN
3 15 NaN NaN NaN 12
换句话说:
- 'id'和'num'是我的索引(通常,我只看到过'id'或'num'是索引,但由于我要检索原始的未融合形式,因此我需要两者都使用)
- 'q'是我的专栏
- 'v'是表中的值
更新
However, the format is not quite the same as what I want above.
推荐答案
您真的很熟.只需将列索引重命名为无",就可以拥有所需的内容.
You're really close slaw. Just rename your column index to None and you've got what you want.
df2 = df.pivot_table(index=['id','num'], columns='q')
df2.columns = df2.columns.droplevel().rename(None)
df2.reset_index().fillna("null").to_csv("test.csv", sep="\t", index=None)
注意,默认情况下,"v"列应为数字,以便可以汇总.否则,Pandas将出现以下错误:
Note that the the 'v' column is expected to be numeric by default so that it can be aggregated. Otherwise, Pandas will error out with:
DataError: No numeric types to aggregate
要解决此问题,您可以使用自定义lambda函数来指定自己的聚合函数:
To resolve this, you can specify your own aggregation function by using a custom lambda function:
df2 = df.pivot_table(index=['id','num'], columns='q', aggfunc= lambda x: x)
这篇关于取消融化 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!