在 R 中的两列中查找忽略顺序的唯一单词对 [英] Find unique pairs of words ignoring their order in two columns in R

查看:20
本文介绍了在 R 中的两列中查找忽略顺序的唯一单词对的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含两列重复值的数据框.

I have a data frame that contains duplicated values in two columns.

   dat<-data.frame(V1 = c("home","cat","fire","sofa","kitchen","sofa"), 
                    V2 = c("cat","home","water","TV","knife","TV"), V3 = c('date1','date1','date2','date3','date4','date3'))

       V1    V2    V3
1    home   cat date1
2     cat  home date1
3    fire water date2
4    sofa    TV date3
5 kitchen knife date4
6    sofa    TV date1

我想从这个数据帧中获取唯一的对,忽略该对在两列之间的显示顺序.

I would like to obtain from this dataframe unique pairs ignoring the order in which the pair is presented between the two columns.

这将是我想要获得的结果:

This would be the result that I would like to obtain:

       V1    V2    V3
1    home   cat date1
2    fire water date2
3    sofa    TV date3
4 kitchen knife date4

推荐答案

dat[!duplicated(t(apply(dat, 1, sort))),]

使用 applysort 将遍历每一行并排序.然后我们可以转置输出并使用 duplicated 确定重复项.因为 duplicated 返回一个布尔值,我们然后将 dat 中的所有行子集,其中 duplicated = FALSE.

Using apply and sort will loop through each row and sort. We can then transpose the output and determine duplicates using duplicated. Because duplicated returns a boolean we then subset all rows in dat where duplicated = FALSE.

这篇关于在 R 中的两列中查找忽略顺序的唯一单词对的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆