如何在 pandas 数据框中删除唯一行? [英] How to drop unique rows in a pandas dataframe?
问题描述
我陷入了一个看似简单的问题:在熊猫数据框中删除唯一的行.基本上,与 drop_duplicates()
相反.
I am stuck with a seemingly easy problem: dropping unique rows in a pandas dataframe. Basically, the opposite of drop_duplicates()
.
让我们说这是我的数据:
Let's say this is my data:
A B C
0 foo 0 A
1 foo 1 A
2 foo 1 B
3 bar 1 A
当A和B唯一时,我想删除行,即我只保留第1行和第2行.
I would like to drop the rows when A, and B are unique, i.e. I would like to keep only the rows 1 and 2.
我尝试了以下操作:
# Load Dataframe
df = pd.DataFrame({"A":["foo", "foo", "foo", "bar"], "B":[0,1,1,1], "C":["A","A","B","A"]})
uniques = df[['A', 'B']].drop_duplicates()
duplicates = df[~df.index.isin(uniques.index)]
但是我只能得到第2行,因为唯一性中有0、1和3!
But I only get the row 2, as 0, 1, and 3 are in the uniques!
推荐答案
选择所有重复行的解决方案:
Solutions for select all duplicated rows:
您可以使用 duplicated
带有子集和参数keep=False
的元素,用于选择所有重复项:
You can use duplicated
with subset and parameter keep=False
for select all duplicates:
df = df[df.duplicated(subset=['A','B'], keep=False)]
print (df)
A B C
1 foo 1 A
2 foo 1 B
使用 transform
:
df = df[df.groupby(['A', 'B'])['A'].transform('size') > 1]
print (df)
A B C
1 foo 1 A
2 foo 1 B
对所有唯一行进行了一些修改的解决方案:
A bit modified solutions for select all unique rows:
#invert boolean mask by ~
df = df[~df.duplicated(subset=['A','B'], keep=False)]
print (df)
A B C
0 foo 0 A
3 bar 1 A
df = df[df.groupby(['A', 'B'])['A'].transform('size') == 1]
print (df)
A B C
0 foo 0 A
3 bar 1 A
这篇关于如何在 pandas 数据框中删除唯一行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!