从pandas groupby对象中选择多个组 [英] Select multiple groups from pandas groupby object
问题描述
我正在尝试熊猫的分组依据功能,尤其是
I am experimenting with the groupby features of pandas, in particular
gb = df.groupby('model')
gb.hist()
由于gb有50个组,因此结果非常混乱,我只想探索前5个组的结果.
Since gb has 50 groups the result is quite cluttered, I would like to explore the result only for the first 5 groups.
我发现了如何使用groups
或get_group
选择单个组(如何通过键按数据框访问熊猫),但不是直接选择多个组的方法.
我能做的最好的是:
I found how to select a single group with groups
or get_group
(How to access pandas groupby dataframe by key), but not how to select multiple groups directly.
The best I could do is :
groups = dict(list(gb))
subgroup = pd.concat(groups.values()[:4])
subgroup.groupby('model').hist()
还有更直接的方法吗?
Is there a more direct way?
推荐答案
您可以做类似的事情
new_gb = pandas.concat( [ gb.get_group(group) for i,group in enumerate( gb.groups) if i < 5 ] ).groupby('model')
new_gb.hist()
尽管,我会采取不同的方法.您可以使用collections.Counter
对象快速获取组:
Although, I would approach it differently. You can use the collections.Counter
object to get groups fast:
import collections
df = pandas.DataFrame.from_dict({'model': pandas.np.random.randint(0, 3, 10), 'param1': pandas.np.random.random(10), 'param2':pandas.np.random.random(10)})
# model param1 param2
#0 2 0.252379 0.985290
#1 1 0.059338 0.225166
#2 0 0.187259 0.808899
#3 2 0.773946 0.696001
#4 1 0.680231 0.271874
#5 2 0.054969 0.328743
#6 0 0.734828 0.273234
#7 0 0.776684 0.661741
#8 2 0.098836 0.013047
#9 1 0.228801 0.827378
model_groups = collections.Counter(df.model)
print(model_groups) #Counter({2: 4, 0: 3, 1: 3})
现在,您可以像字典一样遍历Counter
对象,并查询所需的组:
Now you can iterate over the Counter
object like a dictionary, and query the groups you want:
new_df = pandas.concat( [df.query('model==%d'%key) for key,val in model_groups.items() if val < 4 ] ) # for example, but you can select the models however you like
# model param1 param2
#2 0 0.187259 0.808899
#6 0 0.734828 0.273234
#7 0 0.776684 0.661741
#1 1 0.059338 0.225166
#4 1 0.680231 0.271874
#9 1 0.228801 0.827378
现在您可以使用内置的pandas.DataFrame.groupby
功能
Now you can use the built-in pandas.DataFrame.groupby
function
gb = new_df.groupby('model')
gb.hist()
由于model_groups
包含所有组,因此您可以根据需要从中选择.
Since model_groups
contains all of the groups, you can just pick from it as you wish.
如果您的model
列包含字符串值(名称或其他名称)而不是整数,则它们将全部正常工作-只需将查询参数从'model==%d'%key
更改为'model=="%s"'%key
.
If your model
column contains string values (names or something) instead of integers, it will all work the same - just change the query argument from 'model==%d'%key
to 'model=="%s"'%key
.
这篇关于从pandas groupby对象中选择多个组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!