pandas 更好的排序,分组和求和方法 [英] Pandas better way for Sorting, Grouping, Summing
问题描述
对Pandas来说是新手,所以想知道是否还有一种更泛泛的方式(对它进行硬币化!)来对一些数据进行排序,分组,然后对部分数据求和.问题是要找到一系列值中的3个最大值,然后仅求和.
New to Pandas so wondering if there is a more Pandithic (coining it!) way to sort some data, group it, and then sum part of it. The problem is to find the 3 largest values in a series of values and then sum only them.
census_cp是一个数据框,其中包含有关州县的信息.我当前的解决方案是:
census_cp is a dataframe with information about counties of states. My current solution is:
cen_sort = census_cp.groupby('STNAME').head(3)
cen_sort = cen_sort.groupby('STNAME').sum().sort_values(by='CENSUS2010POP', ascending=False).head(n=3)
cen_sort = cen_sort.reset_index()
print(cen_sort['STNAME'].values.tolist())
我特别想知道是否有更好的方法可以做到这一点,以及为什么我不能将总和放在上一行的末尾并链接在一起,这在我看来显然是关联的项目(获取每个项目的前三项)并将它们添加在一起).
Im specifically curious if there is a better way to do this as well as why i cant put the sum at the end of the previous line and chain together what seems to me to be obviously connected items (get the top 3 of each and add them together).
推荐答案
我认为您可以使用 sum
首先是groupby
,然后是
I think you can use head
with sum
first with groupby
and then nlargest
:
df = census_cp.groupby('STNAME')
.apply(lambda x: x.head(3).sum(numeric_only=True))
.reset_index()
.nlargest(3, 'CENSUS2010POP')
示例:
census_cp = pd.DataFrame({'STNAME':list('abscscbcdbcsscae'),
'CENSUS2010POP':[4,5,6,5,6,2,3,4,5,6,4,5,4,3,6,5]})
print (census_cp)
CENSUS2010POP STNAME
0 4 a
1 5 b
2 6 s
3 5 c
4 6 s
5 2 c
6 3 b
7 4 c
8 5 d
9 6 b
10 4 c
11 5 s
12 4 s
13 3 c
14 6 a
15 5 e
df = census_cp.groupby('STNAME') \
.apply(lambda x: x.head(3).sum(numeric_only=True)) \
.reset_index() \
.nlargest(3, 'CENSUS2010POP')
print (df)
STNAME CENSUS2010POP
5 s 17
1 b 14
2 c 11
如果需要双顶3
nlargest
总和值使用:
If need double top 3
nlargest
per groups and then nlargest
of summed values use:
df1 = census_cp.groupby('STNAME')['CENSUS2010POP']
.apply(lambda x: x.nlargest(3).sum())
.nlargest(3)
.reset_index()
print (df1)
STNAME CENSUS2010POP
0 s 17
1 b 14
2 c 13
或者:
df1 = census_cp.groupby('STNAME')['CENSUS2010POP'].nlargest(3)
.groupby(level=0)
.sum()
.nlargest(3)
.reset_index()
print (df1)
STNAME CENSUS2010POP
0 s 17
1 b 14
2 c 13
这篇关于 pandas 更好的排序,分组和求和方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!