从数据框中成对的重复删除 [英] pair-wise duplicate removal from dataframe

查看:23
本文介绍了从数据框中成对的重复删除的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这似乎是一个简单的问题,但我似乎无法弄清楚.如果两列具有相同的值,我想从数据帧 (df) 中删除重复项,即使这些值的相反顺序.我的意思是,假设您有以下数据框:

This seems like a simple problem but I can't seem to figure it out. I'd like to remove duplicates from a dataframe (df) if two columns have the same values, even if those values are in the reverse order. What I mean is, say you have the following data frame:

a <- c(rep("A", 3), rep("B", 3), rep("C",2))
b <- c('A','B','B','C','A','A','B','B')
df <-data.frame(a,b)

  a b
1 A A
2 A B
3 A B
4 B C
5 B A
6 B A
7 C B
8 C B

如果我现在删除重复项,我会得到以下数据框:

If I now remove duplicates, I get the following data frame:

df[duplicated(df),]

  a b
3 A B
6 B A
8 C B

但是,我还想删除此数据框中的第 6 行,因为A"、B"与B"、A"相同.如何自动执行此操作?

However, I would also like to remove the row 6 in this data frame, since "A", "B" is the same as "B", "A". How can I do this automatically?

理想情况下,我可以指定要比较的两列,因为数据框可能有不同的列并且可能非常大.

Ideally I could specify which two columns to compare since the data frames could have varying columns and can be quite large.

谢谢!

推荐答案

一个解决方案是先对df的每一行进行排序:

One solution is to first sort each row of df:

for (i in 1:nrow(df))
{
    df[i, ] = sort(df[i, ])
}
df

a b
1 A A
2 A B
3 A B
4 B C
5 A B
6 A B
7 B C
8 B C

此时只需删除重复元素即可:

At that point it's just a matter of removing the duplicated elements:

df = df[!duplicated(df),]
df
  a b 
1 A A
2 A B
4 B C

正如评论中提到的thelatemail,您的代码实际上保留重复.您需要使用 !duplicated 来删除它们.

As thelatemail mentioned in the comments, your code actualy keeps the duplicates. You need to use !duplicated to remove them.

这篇关于从数据框中成对的重复删除的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆