pandas :从出现超过X次的列中获取值 [英] Pandas: Get values from column that appear more than X times
问题描述
我在pandas中有一个数据框,并且想获取某列中所有出现超过X次的值.我知道这应该很容易,但是以某种方式我目前的尝试并没有达到目的.
I have a data frame in pandas and would like to get all the values of a certain column that appear more than X times. I know this should be easy but somehow I am not getting anywhere with my current attempts.
这里是一个例子:
>>> df2 = pd.DataFrame([{"uid": 0, "mi":1}, {"uid": 0, "mi":2}, {"uid": 0, "mi":1}, {"uid": 0, "mi":1}])
>>> df2
mi uid
0 1 0
1 2 0
2 1 0
3 1 0
现在假设我想从出现在"mi"列中的所有值出现两次以上,那么结果应该是
Now supposed I want to get all values from column "mi" that appear more than 2 times, the result should be
>>> <fancy query>
array([1])
我已经用groupby和count尝试了几件事,但是我总是最终得到一个包含值和它们各自的计数的序列,但是不知道如何从中提取计数超过X的值.
I have tried a couple of things with groupby and count but I always end up with a series with the values and their respective counts but don't know how to extract the values that have count more than X from that:
>>> df2.groupby('mi').mi.count() > 2
mi
1 True
2 False
dtype: bool
但是我现在该如何使用它来获得mi的值呢?
But how can I use this now to get the values of mi that are true?
任何提示表示赞赏:)
推荐答案
还是这样:
创建表:
>>> import pandas as pd
>>> df2 = pd.DataFrame([{"uid": 0, "mi":1}, {"uid": 0, "mi":2}, {"uid": 0, "mi":1}, {"uid": 0, "mi":1}])
获取每次事件的计数:
>>> vc = df2.mi.value_counts()
>>> print vc
1 3
2 1
打印出出现两次以上的内容:
Print out those that occur more than 2 times:
>>> print vc[vc > 2].index[0]
1
这篇关于 pandas :从出现超过X次的列中获取值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!