从pandas groupby对象中选择多个组 [英] Select multiple groups from pandas groupby object

查看：105 发布时间：2020/5/23 22:53:21 python pandas

本文介绍了从pandas groupby对象中选择多个组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试熊猫的分组依据功能，尤其是

I am experimenting with the groupby features of pandas, in particular

gb = df.groupby('model')
gb.hist()

由于gb有50个组，因此结果非常混乱，我只想探索前5个组的结果.

Since gb has 50 groups the result is quite cluttered, I would like to explore the result only for the first 5 groups.

我发现了如何使用groups或get_group选择单个组(如何通过键按数据框访问熊猫)，但不是直接选择多个组的方法. 我能做的最好的是:

I found how to select a single group with groups or get_group (How to access pandas groupby dataframe by key), but not how to select multiple groups directly. The best I could do is :

groups = dict(list(gb))
subgroup = pd.concat(groups.values()[:4])
subgroup.groupby('model').hist()

还有更直接的方法吗?

Is there a more direct way?

推荐答案

您可以做类似的事情

new_gb = pandas.concat( [ gb.get_group(group) for i,group in enumerate( gb.groups) if i < 5 ] ).groupby('model')    
new_gb.hist()

尽管，我会采取不同的方法.您可以使用collections.Counter对象快速获取组:

Although, I would approach it differently. You can use the collections.Counter object to get groups fast:

import collections

df = pandas.DataFrame.from_dict({'model': pandas.np.random.randint(0, 3, 10), 'param1': pandas.np.random.random(10), 'param2':pandas.np.random.random(10)})
#   model    param1    param2
#0      2  0.252379  0.985290
#1      1  0.059338  0.225166
#2      0  0.187259  0.808899
#3      2  0.773946  0.696001
#4      1  0.680231  0.271874
#5      2  0.054969  0.328743
#6      0  0.734828  0.273234
#7      0  0.776684  0.661741
#8      2  0.098836  0.013047
#9      1  0.228801  0.827378
model_groups = collections.Counter(df.model)
print(model_groups) #Counter({2: 4, 0: 3, 1: 3})

现在，您可以像字典一样遍历Counter对象，并查询所需的组:

Now you can iterate over the Counter object like a dictionary, and query the groups you want:

new_df = pandas.concat( [df.query('model==%d'%key) for key,val in model_groups.items() if val < 4 ] ) # for example, but you can select the models however you like  
#   model    param1    param2
#2      0  0.187259  0.808899
#6      0  0.734828  0.273234
#7      0  0.776684  0.661741
#1      1  0.059338  0.225166
#4      1  0.680231  0.271874
#9      1  0.228801  0.827378

现在您可以使用内置的pandas.DataFrame.groupby功能

Now you can use the built-in pandas.DataFrame.groupby function

gb = new_df.groupby('model')
gb.hist()

由于model_groups包含所有组，因此您可以根据需要从中选择.

Since model_groups contains all of the groups, you can just pick from it as you wish.

如果您的model列包含字符串值(名称或其他名称)而不是整数，则它们将全部正常工作-只需将查询参数从'model==%d'%key更改为'model=="%s"'%key.

If your model column contains string values (names or something) instead of integers, it will all work the same - just change the query argument from 'model==%d'%key to 'model=="%s"'%key.

这篇关于从pandas groupby对象中选择多个组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从pandas groupby对象中选择多个组 [英] Select multiple groups from pandas groupby object

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从pandas groupby对象中选择多个组 [英] Select multiple groups from pandas groupby object

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭