pandas 以指定的(附加)列作为枢轴 [英] Pandas pivot with specified (additional) columns

查看:77
本文介绍了 pandas 以指定的(附加)列作为枢轴的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个不同的元组列表,它们被转换为熊猫数据框:

I have a two different list of tuples that are converted into a pandas dataframe:

ls1 = [(1,"A",2),(1,"B",1),(1,"C",3),(2,"A",4),(2,"B",4,),(2,"C",5)]
ls2 = [(1,"A",2),(1,"C",3),(1,"B",1),(1,"D",6),(2,"A",4),(2,"C",5),(2,"B",4,)]
df1 = pandas.DataFrame(ls1, columns=['ID', 'Class', 'count'])
df2 = pandas.DataFrame(ls2, columns=['ID', 'Class', 'count'])

现在,我想从两个数据框中创建一个数据透视表,其列名称分别为"A","B","C","D"类.因此,所有四个列名称(如果可能,也可以按指定的顺序)应该存在于结果数据透视表中.如果没有ID-Class组合的计数,则应使用例如来填充. NaN.

Now I'd like to make a pivot table from both dataframes with the column names for the classes "A", "B", "C", "D". So, all four column names (if possible also in a specified order) should exist in the resulting pivot table. If there are no counts for a ID-Class combination these should be filled with e.g. NaN.

dfpivot1 = df1.pivot(index='ID', columns='Class', values='count')
dfpivot2 = df2.pivot(index='ID', columns='Class', values='count')

>>> dfpivot1
Class  A  B  C
ID            
1      2  1  3
2      4  4  5
>>> 

使用熊猫的.pivot提供df1的数据透视表,但是只有三个Class列("A","B","C").因此,需要修改dfpivot1使其具有列"A","B","C"和"D",从而与dfpivot2的列完全匹配.由于我是从元组列表开始的,其他方法(不使用pandas)可能也会很有趣.

Using the .pivot from pandas provides a pivot table for df1, but with only three Class-columns ("A","B","C"). So, dfpivot1 needs to be modified to have the columns "A","B","C" AND "D" and thus, exactly match the columns of dfpivot2. As I am starting from a list of tuples also other approaches (without using pandas) might be interesting.

推荐答案

我认为您需要添加 reindex 填充所有缺少的列的NaN:

I think you need add reindex_axis or reindex for fill NaNs for all missing columns:

cols = ['A','B','C','D']
dfpivot1 = df1.pivot(index='ID', columns='Class', values='count').reindex_axis(cols, axis=1)
print (dfpivot1)
Class  A  B  C   D
ID                
1      2  1  3 NaN
2      4  4  5 NaN

也可以指定fill_value参数:

cols = ['A','B','C','D']
dfpivot1 = df1.pivot(index='ID', columns='Class', values='count')
              .reindex_axis(cols, fill_value=0, axis=1)
print (dfpivot1)
Class  A  B  C  D
ID               
1      2  1  3  0
2      4  4  5  0


cols = ['A','B','C','D']
dfpivot1 = df1.pivot(index='ID', columns='Class', values='count').reindex(columns=cols)
print (dfpivot1)
Class  A  B  C   D
ID                
1      2  1  3 NaN
2      4  4  5 NaN

这篇关于 pandas 以指定的(附加)列作为枢轴的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆