多列重复 [英] duplicates in multiple columns

查看:24
本文介绍了多列重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的数据框

I have a data frame like so

> df
  a  b c    d
1 1  2 A 1001
2 2  4 B 1002
3 3  6 B 1002
4 4  8 C 1003
5 5 10 D 1004
6 6 12 D 1004
7 7 13 E 1005
8 8 14 E 1006

我想删除 c 列和 d 列中有重复值的行.因此,在此示例中,第 2、3、5 和 6 行将被删除.

I want to remove the rows where there are repeated values in column c AND column d. So in this example rows 2,3,5 and 6 would removed.

我用过这个,效果很好:

I have used this, which works:

df[!(df$c %in% df$c[duplicated(df$c)] & df$d %in% df$d[duplicated(df$d)]),]
>df
  a  b c    d
1 1  2 A 1001
4 4  8 C 1003
7 7 13 E 1005
8 8 14 E 1006

但它看起来很笨重,我不禁想到有更好的方法.有什么建议吗?

but it seems clunky and I can't help but think there is a better way. Any suggestions?

如果有人想重新创建数据框,这里是 dput:

In case anyone wants to re-create the data-frame here is the dput:

df = structure(list(a = c(1, 2, 3, 4, 5, 6, 7, 8), b = c(2, 4, 6, 
8, 10, 12, 13, 14), c = structure(c(1L, 2L, 2L, 3L, 4L, 4L, 5L, 
5L), .Label = c("A", "B", "C", "D", "E"), class = "factor"), 
    d = c(1001, 1002, 1002, 1003, 1004, 1004, 1005, 1006)), .Names = c("a", 
"b", "c", "d"), row.names = c(NA, -8L), class = "data.frame")

推荐答案

如果您使用 duplicated 两次,它会起作用:

It works if you use duplicated twice:

df[!(duplicated(df[c("c","d")]) | duplicated(df[c("c","d")], fromLast = TRUE)), ]

  a  b c    d
1 1  2 A 1001
4 4  8 C 1003
7 7 13 E 1005
8 8 14 E 1006

这篇关于多列重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆