根据列value_counts(pandas)过滤数据框 [英] Filtering dataframe based on column value_counts (pandas)

查看：558 发布时间：2020/5/24 2:43:57 python pandas

本文介绍了根据列value_counts(pandas)过滤数据框的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是第一次尝试大熊猫.我有一个包含两列的数据框:user_id和string.每个user_id可能具有多个字符串，因此多次出现在数据帧中.我想从中得出另一个数据框；仅列出那些至少关联了2个或更多strings的user_ids.

I'm trying out pandas for the first time. I have a dataframe with two columns: user_id and string. Each user_id may have several strings, thus showing up in the dataframe multiple times. I want to derive another dataframe from this; one where only those user_ids are listed that have at least 2 or more strings associated to them.

我尝试了df[df['user_id'].value_counts()> 1]，我认为这是执行此操作的标准方法，但它会产生IndexingError: Unalignable boolean Series key provided.有人可以弄清楚我的概念并提供正确的选择吗?

I tried df[df['user_id'].value_counts()> 1], which I thought was the standard way to do this, but it yields IndexingError: Unalignable boolean Series key provided. Can someone clear out my concept and provide the correct alternative?

推荐答案

我认为您需要

I think you need transform, because need same index of mask as df. But if use value_counts index is changed and it raise error.

df[df.groupby('user_id')['user_id'].transform('size') > 1]

这篇关于根据列value_counts(pandas)过滤数据框的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据列value_counts(pandas)过滤数据框 [英] Filtering dataframe based on column value_counts (pandas)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

根据列value_counts(pandas)过滤数据框 [英] Filtering dataframe based on column value_counts (pandas)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭