检查重复项是否跨R中的两列 [英] Checking duplicates cross two columns in R

查看:73
本文介绍了检查重复项是否跨R中的两列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如,我的数据集如下:

For example, my data set is like this:

  Var1 Var2 value
1  ABC  BCD   0.5
2  DEF  CDE   0.3
3  CDE  DEF   0.3
4  BCD  ABC   0.5

uniqueduplicated可能无法检测到第3行和第4行的重复.

unique and duplicated may not able to detect the duplication of row 3 and 4.

由于我的数据集很大,是否有任何有效的方法来仅保留唯一的行? 像这样:

Since my data set is quite large so is there any efficient way to only keep the unique rows? Like this:

  Var1 Var2 value
1  ABC  BCD   0.5
2  DEF  CDE   0.3

为了使您信服,您可以使用:

For your convince, you can use:

dat <- data.frame(Var1 = c("ABC", "DEF", "CDE", "BCD"),
                  Var2 = c("BCD", "CDE", "DEF", "ABC"),
                  value = c(0.5, 0.3, 0.3, 0.5))

此外,如有可能,还可以根据Var1(超过10,000个级别)为前20个变量生成一个分布表.

Also, if possible is there any way to also produce a distribution table for the top 20 variables base on the Var1 (more than 10,000 levels).

P.S.我已经尝试过dat$count <- dat(as.character(dat$Var1))[as.character(dat$Var1)],但是运行时间太长.

P.S. I have tried dat$count <- dat(as.character(dat$Var1))[as.character(dat$Var1)], but it just take too long to run.

推荐答案

另一种选择是按行对列Var1Var2进行排序,然后应用duplicated.

Another option would be to sort columns Var1 and Var2 rowwise and then apply duplicated.

idx <- !duplicated(t(apply(dat[c("Var1", "Var2")], 1, sort)))
dat[idx, ]
#  Var1 Var2 value
#1  ABC  BCD   0.5
#2  DEF  CDE   0.3

这篇关于检查重复项是否跨R中的两列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆