如何在 pandas 数据框中删除唯一行? [英] How to drop unique rows in a pandas dataframe?

查看:52
本文介绍了如何在 pandas 数据框中删除唯一行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我陷入了一个看似简单的问题:在熊猫数据框中删除唯一的行.基本上,与 drop_duplicates() 相反.

I am stuck with a seemingly easy problem: dropping unique rows in a pandas dataframe. Basically, the opposite of drop_duplicates().

让我们说这是我的数据:

Let's say this is my data:

    A       B   C  
0   foo     0   A
1   foo     1   A
2   foo     1   B
3   bar     1   A

当A和B唯一时,我想删除行,即我只保留第1行和第2行.

I would like to drop the rows when A, and B are unique, i.e. I would like to keep only the rows 1 and 2.

我尝试了以下操作:

# Load Dataframe
df = pd.DataFrame({"A":["foo", "foo", "foo", "bar"], "B":[0,1,1,1], "C":["A","A","B","A"]})

uniques = df[['A', 'B']].drop_duplicates()
duplicates = df[~df.index.isin(uniques.index)]

但是我只能得到第2行,因为唯一性中有0、1和3!

But I only get the row 2, as 0, 1, and 3 are in the uniques!

推荐答案

选择所有重复行的解决方案:

Solutions for select all duplicated rows:

您可以使用 duplicated 带有子集和参数keep=False的元素,用于选择所有重复项:

You can use duplicated with subset and parameter keep=False for select all duplicates:

df = df[df.duplicated(subset=['A','B'], keep=False)]
print (df)
     A  B  C
1  foo  1  A
2  foo  1  B

使用 transform :

df = df[df.groupby(['A', 'B'])['A'].transform('size') > 1]
print (df)
     A  B  C
1  foo  1  A
2  foo  1  B

对所有唯一行进行了一些修改的解决方案:

A bit modified solutions for select all unique rows:

#invert boolean mask by ~
df = df[~df.duplicated(subset=['A','B'], keep=False)]
print (df)
     A  B  C
0  foo  0  A
3  bar  1  A

df = df[df.groupby(['A', 'B'])['A'].transform('size') == 1]
print (df)
     A  B  C
0  foo  0  A
3  bar  1  A

这篇关于如何在 pandas 数据框中删除唯一行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆