pandas 集团以最大的总和 [英] Pandas groupby nlargest sum

查看：254 发布时间：2018/5/30 13:38:44 python pandas group-by sum

本文介绍了 pandas 集团以最大的总和的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试使用 groupby ， nlargest 和 sum 在Pandas中一起工作，但无法使其工作。

 州县人口
阿拉巴马a 100 
阿拉巴马州b 50 
阿拉巴马州c 40 
阿拉巴马州d 5 
阿拉巴马州e 1 
 ... 
怀俄明州a.51 180 
怀俄明州b.51 150 
怀俄明州c.51 56 
 Wyoming d.51 5

我想用 groupby 按州选择，然后按人口排列前2个县。然后只用那3个县的人口数来得到该州的总和。

最后，我会列出一份将拥有州和人口（排名前2的县）的名单。

我可以得到 groupby 和 nlargest ，但获得 nlargest（2）的总和是一个挑战。

我现在所在的行很简单： df.groupby（'State'）['Population']。nlargest（2） code>
解决方案执行完毕后，您可以使用 apply groupby ： df.groupby（'State'）['人口'] .apply（lambda grp：grp.nlargest（2）.sum（））我认为你遇到的问题是 df.groupby（'State'）['Population']。nlargest（2）会返回一个DataFrame，所以你不能再做组级操作。一般来说，如果你想在一个组中执行多个操作，你需要使用 apply / agg 。结果输出：州阿拉巴马州150 Wyoming 330 编辑一个稍微简洁的方法，如@cᴏʟᴅsᴘᴇᴇᴅ所示： df.groupby（'State' ）['Population']。nlargest（2）.sum（level = 0）虽然比使用应用在较大的DataFrame上慢很多。使用以下设置：将numpy作为np 导入pandas作为pd 从字符串导入ascii_letters n = 10 ** 6 df = pd.DataFrame（{'A'：np.random.choice（list（ascii_letters），size = n）， 'B'：np.random.randint（10 ** 7，size = $ b 我得到以下时间点：$ b $ b 在[3]中：％timeit df.groupby（'A'）['B']。apply（lambda grp：grp.nlargest（2）.sum（）） 103 ms±每回路1.08毫秒（平均值±标准差。开发。 7个运行，每个10个循环）在[4]中：％timeit df.groupby（'A'）['B']。nlargest（2）.sum（level = 0）每个循环147毫秒±3.38毫秒（平均值±标准差7次，每次循环10次） sum 执行第二个级别 kwarg可能会导致性能下降。 groupby 隐藏。 I am trying to use groupby, nlargest, and sum functions in Pandas together, but having trouble making it work. State County Population Alabama a 100 Alabama b 50 Alabama c 40 Alabama d 5 Alabama e 1 ... Wyoming a.51 180 Wyoming b.51 150 Wyoming c.51 56 Wyoming d.51 5 I want to use groupby to select by state, then get the top 2 counties by population. Then use only those top 3 county population numbers to get a sum for that state. In the end, I'll have a list that will have the state and the population (of it's top 2 counties). I can get the groupby and nlargest to work, but getting the sum of the nlargest(2) is a challenge. The line I have right now is simply: df.groupby('State')['Population'].nlargest(2) 解决方案 You can use apply after performing the groupby: df.groupby('State')['Population'].apply(lambda grp: grp.nlargest(2).sum()) I think this issue you're having is that df.groupby('State')['Population'].nlargest(2) will return a DataFrame, so you can no longer do group level operations. In general, if you want to perform multiple operations in a group, you'll need to use apply/agg. The resulting output: State Alabama 150 Wyoming 330 EDIT A slightly cleaner approach, as suggested by @cᴏʟᴅsᴘᴇᴇᴅ: df.groupby('State')['Population'].nlargest(2).sum(level=0) This is slightly slower than using apply on larger DataFrames though. Using the following setup: import numpy as np import pandas as pd from string import ascii_letters n = 10**6 df = pd.DataFrame({'A': np.random.choice(list(ascii_letters), size=n), 'B': np.random.randint(10**7, size=n)}) I get the following timings: In [3]: %timeit df.groupby('A')['B'].apply(lambda grp: grp.nlargest(2).sum()) 103 ms ± 1.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) In [4]: %timeit df.groupby('A')['B'].nlargest(2).sum(level=0) 147 ms ± 3.38 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) The slower performance is potentially caused by the level kwarg in sum performing a second groupby under the hood. 这篇关于 pandas 集团以最大的总和的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！
查看全文

pandas 集团以最大的总和 [英] Pandas groupby nlargest sum

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas 集团以最大的总和 [英] Pandas groupby nlargest sum

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭