检查重复项是否跨R中的两列 [英] Checking duplicates cross two columns in R

查看：73 发布时间：2020/8/1 19:56:00 r duplicates

本文介绍了检查重复项是否跨R中的两列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

例如，我的数据集如下:

For example, my data set is like this:

  Var1 Var2 value
1  ABC  BCD   0.5
2  DEF  CDE   0.3
3  CDE  DEF   0.3
4  BCD  ABC   0.5

unique和duplicated可能无法检测到第3行和第4行的重复.

unique and duplicated may not able to detect the duplication of row 3 and 4.

由于我的数据集很大，是否有任何有效的方法来仅保留唯一的行? 像这样:

Since my data set is quite large so is there any efficient way to only keep the unique rows? Like this:

  Var1 Var2 value
1  ABC  BCD   0.5
2  DEF  CDE   0.3

为了使您信服，您可以使用:

For your convince, you can use:

dat <- data.frame(Var1 = c("ABC", "DEF", "CDE", "BCD"),
                  Var2 = c("BCD", "CDE", "DEF", "ABC"),
                  value = c(0.5, 0.3, 0.3, 0.5))

此外，如有可能，还可以根据Var1(超过10,000个级别)为前20个变量生成一个分布表.

Also, if possible is there any way to also produce a distribution table for the top 20 variables base on the Var1 (more than 10,000 levels).

P.S.我已经尝试过dat$count <- dat(as.character(dat$Var1))[as.character(dat$Var1)]，但是运行时间太长.

P.S. I have tried dat$count <- dat(as.character(dat$Var1))[as.character(dat$Var1)], but it just take too long to run.

推荐答案

另一种选择是按行对列Var1和Var2进行排序，然后应用duplicated.

Another option would be to sort columns Var1 and Var2 rowwise and then apply duplicated.

idx <- !duplicated(t(apply(dat[c("Var1", "Var2")], 1, sort)))
dat[idx, ]
#  Var1 Var2 value
#1  ABC  BCD   0.5
#2  DEF  CDE   0.3

这篇关于检查重复项是否跨R中的两列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

检查重复项是否跨R中的两列 [英] Checking duplicates cross two columns in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

检查重复项是否跨R中的两列 [英] Checking duplicates cross two columns in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭