pandas groupby:获取知道其ID的组的大小(来自.grouper.group_info [0]) [英] Pandas groupby: get size of a group knowing its id (from .grouper.group_info[0])

查看:244
本文介绍了 pandas groupby:获取知道其ID的组的大小(来自.grouper.group_info [0])的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在以下代码段中,datapandas.DataFrameindicesdata的一组列.用groupby对数据进行分组后,我对组的ID感兴趣,但仅对那些大小大于阈值(例如3)的ID感兴趣.

In the following snippet data is a pandas.DataFrame and indices is a set of columns of the data. After grouping the data with groupby I am interested in the ids of the groups, but only those with a size greater than a threshold (say: 3).

group_ids=data.groupby(list(data.columns[list(indices)])).grouper.group_info[0]

现在,在知道该组的ID的情况下,我如何才能找到哪个组的大小大于或等于3?我只想要具有特定大小的群组ID.

Now, how can I find which group has a size greater than or equal 3 knowing the id of the group? I only want ids of groups with a certain size.

#TODO: filter out ids from group_ids which correspond to groups with sizes < 3 

推荐答案

一种方法是使用 groupby的rel ="noreferrer"> size 方法:

One way is to use the size method of the groupby:

g = data.groupby(...)
size = g.size()
size[size > 3]

例如,这里只有一组大小> 1的

For example, here there is only one group of size > 1:

In [11]: df = pd.DataFrame([[1, 2], [3, 4], [1,6]], columns=['A', 'B'])

In [12]: df
Out[12]:
   A  B
0  1  2
1  3  4
2  1  6 

In [13]: g = df.groupby('A')

In [14]: size = g.size()

In [15]: size[size > 1]
Out[15]:
A
1    2
dtype: int64

如果您只想将DataFrame限制为大型组中的对象,则可以使用过滤器方法:

If you were interested in just restricting the DataFrame to those in large groups you could use the filter method:

In [21]: g.filter(lambda x: len(x) > 1)
Out[21]:
   A  B
0  1  2
2  1  6

这篇关于 pandas groupby:获取知道其ID的组的大小(来自.grouper.group_info [0])的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆