删除在R中的两个列交换值的重复项 [英] Remove duplicates where values are swapped across 2 columns in R

查看:116
本文介绍了删除在R中的两个列交换值的重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的数据框,如下所示:

I have a simple dataframe like this:

| id1 | id2 | location   | comment   |
|-----|-----|------------|-----------|
| 1   | 2   | Alaska     | cold      |
| 2   | 1   | Alaska     | freezing! |
| 3   | 4   | California | nice      |
| 4   | 5   | Kansas     | boring    |
| 9   | 10  | Alaska     | cold      |

前两行是重复的,因为 id1 id2 都去了阿拉斯加。没关系,他们的评论是不同的。

The first two rows are duplicates because id1 and id2 both went to Alaska. It doesn't matter that their comment are different.

如何删除这些重复项之一 - 要么删除就行了。

How can I remove one of these duplicates -- either one would be fine to remove.

我第一次尝试排序 id1 id2 ,然后将索引复制在其中,然后返回并使用索引对原始df进行子集。但我似乎不能拉这个。

I was first trying to sort id1 and id2, then get the index where they are duplicated, then go back and use the index to subset the original df. But I can't seem to pull this off.

df <- data.frame(id1 = c(1,2,3,4,9), id2 = c(2,1,4,5,10), location=c('Alaska', 'Alaska', 'California', 'Kansas', 'Alaska'), comment=c('cold', 'freezing!', 'nice', 'boring', 'cold'))


推荐答案

p>我们可以使用应用 MARGIN = 1 sort 按行排列'id'列,cbind与'location',然后使用重复的获取可用于删除/保留行的逻辑索引。 p>

We can use apply with MARGIN=1 to sort by row for the 'id' columns, cbind with 'location' and then use duplicated to get a logical index that can be used for removing/keeping the rows.

df[!duplicated(data.frame(t(apply(df[1:2], 1, sort)), df$location)),]
#   id1 id2   location comment
#1   1   2     Alaska    cold
#3   3   4 California    nice
#4   4   5     Kansas  boring
#5   9  10     Alaska    cold

这篇关于删除在R中的两个列交换值的重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆