删除对应于小于指定大小的组的行 [英] Drop rows corresponding to groups smaller than specified size

查看:69
本文介绍了删除对应于小于指定大小的组的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对100 questions_id和50 user_id的回答为DataFrame.每行代表来自特定用户的单个问题.桌子看起来像这样.

I have a DataFrame of answers for 100 questions_id and 50 user_id's. Each row represents a single question from a specific user. The table looks something like this.

user_id | question_id | timetaken | answer_1 | answer_2 |
1015    | 1           | 30        | A        | C        |
1015    | 2           | 45        | B        | B        |
1016    | 1           | 15        | A        | A        |
1016    | 2           | 55        | A        | D        |

我正试图筛选出未完成测试的用户.我的思考过程是计算每个用户出现在表中的次数,如果user_id 1015在user_id列中出现100次,我知道他们完成了100个问题.不幸的是,由于问题是随机的,因此我无法使用question_id进行过滤,因此用户可以回答5个问题,其中一个问题可能具有question_id = 100.

I am trying to filter out the users that did not complete the test. My thought process to do this was to count the amount of occurrences each user appears in the table, if the user_id 1015 appears in the column user_id 100 times, I know they completed the 100 questions. Unfortunately, I cannot use the question_id to filter as the questions are random so the user could answer 5 questions and one of them could have the question_id = 100.

我以为这是我的解决方案,但是不能t计算出user_id的出现次数.

I thought this was my solution but couldn't work out how to count the occurrences of user_id.

推荐答案

使用 groupby filter ,非常简洁,旨在达到此目的.

Use groupby and filter, very succinct and intended for this purpose.

df1 = df.groupby('user_id').filter(lambda x: len(x) > 100)


为获得更好的性能,请使用 map :


For better performance, use np.unique and map:

m = dict(zip(*np.unique(df.user_id, return_counts=True)))
df[df['user_id'].map(m) > 100]

这篇关于删除对应于小于指定大小的组的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆