在 R 中的两列中查找忽略顺序的唯一单词对 [英] Find unique pairs of words ignoring their order in two columns in R
问题描述
我有一个包含两列重复值的数据框.
I have a data frame that contains duplicated values in two columns.
dat<-data.frame(V1 = c("home","cat","fire","sofa","kitchen","sofa"),
V2 = c("cat","home","water","TV","knife","TV"), V3 = c('date1','date1','date2','date3','date4','date3'))
V1 V2 V3
1 home cat date1
2 cat home date1
3 fire water date2
4 sofa TV date3
5 kitchen knife date4
6 sofa TV date1
我想从这个数据帧中获取唯一的对,忽略该对在两列之间的显示顺序.
I would like to obtain from this dataframe unique pairs ignoring the order in which the pair is presented between the two columns.
这将是我想要获得的结果:
This would be the result that I would like to obtain:
V1 V2 V3
1 home cat date1
2 fire water date2
3 sofa TV date3
4 kitchen knife date4
推荐答案
dat[!duplicated(t(apply(dat, 1, sort))),]
使用 apply
和 sort
将遍历每一行并排序.然后我们可以转置输出并使用 duplicated
确定重复项.因为 duplicated
返回一个布尔值,我们然后将 dat
中的所有行子集,其中 duplicated = FALSE
.
Using apply
and sort
will loop through each row and sort. We can then transpose the output and determine duplicates using duplicated
. Because duplicated
returns a boolean we then subset all rows in dat
where duplicated = FALSE
.
这篇关于在 R 中的两列中查找忽略顺序的唯一单词对的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!