删除在 R 中的 2 列之间交换值的重复项 [英] Remove duplicates where values are swapped across 2 columns in R
问题描述
我有一个像这样的简单数据框:
I have a simple dataframe like this:
| id1 | id2 | location | comment |
|-----|-----|------------|-----------|
| 1 | 2 | Alaska | cold |
| 2 | 1 | Alaska | freezing! |
| 3 | 4 | California | nice |
| 4 | 5 | Kansas | boring |
| 9 | 10 | Alaska | cold |
前两行是重复的,因为 id1
和 id2
都去了阿拉斯加.他们的评论不同并不重要.
The first two rows are duplicates because id1
and id2
both went to Alaska. It doesn't matter that their comment are different.
如何删除这些重复项之一 - 删除任何一个都可以.
How can I remove one of these duplicates -- either one would be fine to remove.
我首先尝试对 id1
和 id2
进行排序,然后获取它们重复的索引,然后返回并使用索引对原始 df 进行子集化.但我似乎无法做到这一点.
I was first trying to sort id1
and id2
, then get the index where they are duplicated, then go back and use the index to subset the original df. But I can't seem to pull this off.
df <- data.frame(id1 = c(1,2,3,4,9), id2 = c(2,1,4,5,10), location=c('Alaska', 'Alaska', 'California', 'Kansas', 'Alaska'), comment=c('cold', 'freezing!', 'nice', 'boring', 'cold'))
推荐答案
我们可以使用 apply
和 MARGIN=1
来按行 sort
对于 'id' 列,使用 'location' 进行 cbind,然后使用 duplicated
获取可用于删除/保留行的逻辑索引.
We can use apply
with MARGIN=1
to sort
by row for the 'id' columns, cbind with 'location' and then use duplicated
to get a logical index that can be used for removing/keeping the rows.
df[!duplicated(data.frame(t(apply(df[1:2], 1, sort)), df$location)),]
# id1 id2 location comment
#1 1 2 Alaska cold
#3 3 4 California nice
#4 4 5 Kansas boring
#5 9 10 Alaska cold
这篇关于删除在 R 中的 2 列之间交换值的重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!