大 pandas 的总和,但排除某些列 [英] Pandas sum by groupby, but exclude certain columns
问题描述
在Pandas数据框上执行groupby的最佳方式是什么,但排除了groupby中的某些列?例如。我有这个foll。数据框:
代码国家Item_Code项目Ele_Code单位Y1961 Y1962 Y1963
pre>
2阿富汗15小麦5312 Ha 10 20 30
2阿富汗25玉米5312哈10 20 30
4安哥拉15小麦7312哈30 40 50
4安哥拉25玉米7312哈30 40 50
我想按列Country和Item_Code分组,只计算列Y1961,Y1962和Y1963下的行的总和。生成的数据框应该如下所示:
代码国家Item_Code物品Ele_Code单位Y1961 Y1962 Y1963
2阿富汗15 C3 5312哈20 40 60
4安哥拉25 C4 7312哈60 80 100
现在,我我是这样做的:
df.groupby('Country')。sum()
但是,这也将Item_Code列中的值相加。有什么方法可以指定哪些列包含在sum()操作中以及哪些列要排除?
解决方案您可以选择groupby的列:
在[11]中:df.groupby(['Country','Item_Code']) [[Y1961,Y1962,Y1963]]。sum()
Out [11]:
Y1961 Y1962 Y1963
国家项目_代码
阿富汗15 10 20 30
25 10 20 30
安哥拉15 30 40 50
25 30 40 50
$ b $注意,传递的列表必须是列的子集,否则您将看到KeyError。What is the best way to do a groupby on a Pandas dataframe, but exclude some columns from that groupby? E.g. I have the foll. dataframe:
Code Country Item_Code Item Ele_Code Unit Y1961 Y1962 Y1963 2 Afghanistan 15 Wheat 5312 Ha 10 20 30 2 Afghanistan 25 Maize 5312 Ha 10 20 30 4 Angola 15 Wheat 7312 Ha 30 40 50 4 Angola 25 Maize 7312 Ha 30 40 50
I want to groupby the column Country and Item_Code and only compute the sum of the rows falling under the columns Y1961, Y1962 and Y1963. The resulting dataframe should look like this:
Code Country Item_Code Item Ele_Code Unit Y1961 Y1962 Y1963 2 Afghanistan 15 C3 5312 Ha 20 40 60 4 Angola 25 C4 7312 Ha 60 80 100
Right now, I am doing this:
df.groupby('Country').sum()
However, this adds up the values in the Item_Code column as well. Is there any way I can specify which columns to include in the sum() operation and which ones to exclude?
解决方案You can select the columns of a groupby:
In [11]: df.groupby(['Country', 'Item_Code'])[["Y1961", "Y1962", "Y1963"]].sum() Out[11]: Y1961 Y1962 Y1963 Country Item_Code Afghanistan 15 10 20 30 25 10 20 30 Angola 15 30 40 50 25 30 40 50
Note that the list passed must be a subset of the columns otherwise you'll see a KeyError.
这篇关于大 pandas 的总和,但排除某些列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!