如何总结不同的groupby组合? [英] How to summarize on different groupby combinations?
问题描述
我正在按郡县列出前三名的作物表.一些县以相同的顺序拥有相同的农作物品种.其他县具有相同的农作物品种,但顺序不同.
I am compiling a table of top-3 crops by county. Some counties have the same crop varieties in the same order. Other counties have the same crop varieties in a different order.
df1 = pd.DataFrame( {
"County" : ["Harney", "Baker", "Wheeler", "Hood River", "Wasco" , "Morrow","Union","Lake"] ,
"Crop1" : ["grain", "melons", "melons", "apples", "pears", "raddish","pears","pears"],
"Crop2" : ["melons","grain","grain","melons","carrots","pears","carrots","carrots"],
"Crop3": ["apples","apples","apples","grain","raddish","carrots","raddish","raddish"],
"Total_pop": [2000,1500,3000,1500,2000,2500,2700,2000]} )
我可以对Crop1,Crop2和Crop3进行分组,并获得total_pop的总和:
I can do a groupby on Crop1, Crop2 and Crop3 and get the sum of total_pop:
df1_grouped=df1.groupby(['Crop1',"Crop2","Crop3"])['Total_pop'].sum().reset_index()
这给了我特定农作物组合的总数:
That gives me the total for specific combinations of the crops:
df1_grouped
apples melons grain 1500
grain melons apples 2000
melons grain apples 4500
pears carrots raddish 6700
raddish pears carrots 2500
我想得到的是不同作物组合的总人口-不管列出的作物是crop1,crop2还是crop3.期望的结果将是这样:
What I would like, though, is to get the total population on different combinations of crops -- irrespective of whether the listed crop was crop1, crop2, or crop3. The desired result would be this:
apples melons grain 8000
pears carrots raddish 9200
谢谢您的指导.
推荐答案
方法1:
合并crop
列
>>> df1['combined_temp'] = df1.apply(lambda x : list([x['Crop1'],
... x['Crop2'],
... x['Crop3']]),axis=1)
>>> df1.head()
County Crop1 Crop2 Crop3 Total_pop combined_temp
0 Harney grain melons apples 2000 [grain, melons, apples]
1 Baker melons grain apples 1500 [melons, grain, apples]
2 Wheeler melons grain apples 3000 [melons, grain, apples]
3 Hood River apples melons grain 1500 [apples, melons, grain]
4 Wasco pears carrots raddish 2000 [pears, carrots, raddish]
将其设为已排序的元组
>>> df1['sorted'] = df1.apply(lambda x : tuple(sorted(x['combined_temp'])),axis=1)
>>> df1.head()
County Crop1 Crop2 ... Total_pop combined_temp sorted
0 Harney grain melons ... 2000 [grain, melons, apples] (apples, grain, melons)
1 Baker melons grain ... 1500 [melons, grain, apples] (apples, grain, melons)
2 Wheeler melons grain ... 3000 [melons, grain, apples] (apples, grain, melons)
3 Hood River apples melons ... 1500 [apples, melons, grain] (apples, grain, melons)
4 Wasco pears carrots ... 2000 [pears, carrots, raddish] (carrots, pears, raddish)
然后按操作进行常规分组
then proceed to your normal group by operation
>>> df1_grouped = df1.groupby(['sorted'])['Total_pop'].sum().reset_index()
>>> df1_grouped
sorted Total_pop
0 (apples, grain, melons) 8000
1 (carrots, pears, raddish) 9200
方法2:
基于
Method 2: A shorted version based on the answer by aws-apprentice
df = df1.copy()
grouping_cols = ['Crop1', 'Crop2', 'Crop3']
df[grouping_cols] = pd.DataFrame(df.loc[:, grouping_cols] \
.apply(set, axis=1) \
.apply(sorted)
.values \
.tolist(), columns=grouping_cols)
>>> df.head()
County Crop1 Crop2 Crop3 Total_pop
0 Harney apples grain melons 2000
1 Baker apples grain melons 1500
2 Wheeler apples grain melons 3000
3 Hood River apples grain melons 1500
4 Wasco carrots pears raddish 2000
现在按组分组
>>> df.groupby(grouping_cols).Total_pop.sum()
Crop1 Crop2 Crop3
apples grain melons 8000
carrots pears raddish 9200
Name: Total_pop, dtype: int64
但我个人更喜欢使用numpy的答案
这篇关于如何总结不同的groupby组合?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!