从不出现特定次数的数据框中删除用户名? [英] Removing usernames from a dataframe that do not appear a certain number of times?

查看:75
本文介绍了从不出现特定次数的数据框中删除用户名?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图理解下面提供的内容(我在网上找到了,但并不完全了解).我实际上希望删除至少4次未出现在数据框中的用户名(除了删除此名称外,我不想以任何其他方式修改数据框).以下代码是否可以解决此问题,如果可以,您能否解释结合lambda的过滤器如何实现此目的?我有以下内容:

I am trying to understand the provided below (which I found online, but do not fully understand). I want to essentially remove user names that do not appear in my dataframe at least 4 times (other than removing this names, I do not want to modify the dataframe in any other way). Does the following code solve this problem and if so, can you explain how the filter combined with the lambda achieves this? I have the following:

df.groupby('userName').filter(lambda x: len(x) > 4)

我也乐于接受易于理解的替代解决方案/方法.

I am also open to alternative solutions/approaches that are easy to understand.

推荐答案

您可以检查在更大的DataFrame中更快的解决方案是使用 :

Faster solution in bigger DataFrame is with transform and boolean indexing:

df[df.groupby('userName')['userName'].transform('size') > 4]

示例:

df = pd.DataFrame({'userName':['a'] * 5 + ['b'] * 3 + ['c'] * 6})

print (df.groupby('userName').filter(lambda x: len(x) > 4))
   userName
0         a
1         a
2         a
3         a
4         a
8         c
9         c
10        c
11        c
12        c
13        c

print (df[df.groupby('userName')['userName'].transform('size') > 4])
   userName
0         a
1         a
2         a
3         a
4         a
8         c
9         c
10        c
11        c
12        c
13        c

时间:

np.random.seed(123)
N = 1000000
L = np.random.randint(1000,size=N).astype(str)
df = pd.DataFrame({'userName': np.random.choice(L, N)})
print (df)

In [128]: %timeit (df.groupby('userName').filter(lambda x: len(x) > 1000))
1 loop, best of 3: 468 ms per loop

In [129]: %timeit (df[df.groupby('userName')['userName'].transform(len) > 1000])
1 loop, best of 3: 661 ms per loop

In [130]: %timeit (df[df.groupby('userName')['userName'].transform('size') > 1000])
10 loops, best of 3: 96.9 ms per loop

这篇关于从不出现特定次数的数据框中删除用户名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆