根据列value_counts(pandas)过滤数据框 [英] Filtering dataframe based on column value_counts (pandas)

查看:558
本文介绍了根据列value_counts(pandas)过滤数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是第一次尝试大熊猫.我有一个包含两列的数据框:user_idstring.每个user_id可能具有多个字符串,因此多次出现在数据帧中.我想从中得出另一个数据框;仅列出那些至少关联了2个或更多stringsuser_ids.

I'm trying out pandas for the first time. I have a dataframe with two columns: user_id and string. Each user_id may have several strings, thus showing up in the dataframe multiple times. I want to derive another dataframe from this; one where only those user_ids are listed that have at least 2 or more strings associated to them.

我尝试了df[df['user_id'].value_counts()> 1],我认为这是执行此操作的标准方法,但它会产生IndexingError: Unalignable boolean Series key provided.有人可以弄清楚我的概念并提供正确的选择吗?

I tried df[df['user_id'].value_counts()> 1], which I thought was the standard way to do this, but it yields IndexingError: Unalignable boolean Series key provided. Can someone clear out my concept and provide the correct alternative?

推荐答案

我认为您需要

I think you need transform, because need same index of mask as df. But if use value_counts index is changed and it raise error.

df[df.groupby('user_id')['user_id'].transform('size') > 1]

这篇关于根据列value_counts(pandas)过滤数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆