根据列value_counts(pandas)过滤数据框 [英] Filtering dataframe based on column value_counts (pandas)
问题描述
我是第一次尝试大熊猫.我有一个包含两列的数据框:user_id
和string
.每个user_id可能具有多个字符串,因此多次出现在数据帧中.我想从中得出另一个数据框;仅列出那些至少关联了2个或更多strings
的user_ids
.
I'm trying out pandas for the first time. I have a dataframe with two columns: user_id
and string
. Each user_id may have several strings, thus showing up in the dataframe multiple times. I want to derive another dataframe from this; one where only those user_ids
are listed that have at least 2 or more strings
associated to them.
我尝试了df[df['user_id'].value_counts()> 1]
,我认为这是执行此操作的标准方法,但它会产生IndexingError: Unalignable boolean Series key provided
.有人可以弄清楚我的概念并提供正确的选择吗?
I tried df[df['user_id'].value_counts()> 1]
, which I thought was the standard way to do this, but it yields IndexingError: Unalignable boolean Series key provided
. Can someone clear out my concept and provide the correct alternative?
推荐答案
I think you need transform
, because need same index
of mask as df
. But if use value_counts
index
is changed and it raise error.
df[df.groupby('user_id')['user_id'].transform('size') > 1]
这篇关于根据列value_counts(pandas)过滤数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!