比较两列的组并返回索引匹配R [英] Compare group of two columns and return index matches R
问题描述
非常感谢您的阅读。我确信这很简单,很抱歉。
Many thanks for reading. Apologies for what I'm sure is a simple task.
我有一个数据框:
(编辑:添加了额外的列,不包括在比较中)
I have a dataframe: (Edited: Added extra column not to be included in comparison)
b = c(5, 6, 7, 8, 10, 11)
c = c('david','alan','pete', 'ben', 'richard', 'edd')
d = c('alex','edd','ben','pete','raymond', 'alan')
df = data.frame(b, c, d)
df
b c d
1 5 david alex
2 6 alan edd
3 7 pete ben
4 8 ben pete
5 10 richard raymond
6 11 edd alan
我想比较列 c
和 d
的组与列 d
和 c
。也就是说,对于一行,我想将 c
和 d
中的组合值与 d
和 c
用于所有其他行。
I want to compare the group of columns c
and d
with the group of columns d
and c
. That is, for one row, I want to compare the combined values in c
and d
with the combined values in d
and c
for all other rows.
(请注意值可以是字符或整数)
(Note the values could either be characters or integers)
这些要匹配的地方我要返回索引匹配的那些行中的一个,最好是列表列表。我需要能够访问索引而不引用列 c
或 d
中的值。
Where these match I want to return the index of those rows which match, preferably as a list of lists. I need to be able to access the indexes without referring to the values in column c
or d
.
即对于上述数据框,我的预期输出将是:
I.e. for the above dataframe, my expected output would be:
c(c(2, 6), c(3, 4))
((2,6), (3,4))
为:
Row 2: (c + d == alan + edd) = row 6: (d + c == edd + alan)
Row 3: (c + d == pete + ben) = row 4: (d + c == ben + pete)
我了解如何使用 match
melt
,但如果将它们连接在一起并遍历所有可能的行组合,则不会。
I understand how to determine the match case for two separate columns using match
melt
, but not if they are joined together and iterating over all possible row combinations.
我设想的是:
lapply(1:6, function(x), ifelse((df$a & df$b) == (df$b & df$a), index(x), 0))
但显然这是不正确的,不会起作用。
But obviously that is incorrect and won't work.
我咨询了以下问题,但未能提出答案。我不知道从哪里开始。
I consulted the following questions but have been unable to formulate an answer. I have no idea where to begin.
Matching multiple columns on different data frames and getting other column as result
< a href = https://stackoverflow.com/questions/6880450/match-two-columns-with-two-other-columns>将两列与另外两列匹配
如何实现以上目标?
推荐答案
您可以执行以下操作。它根据由df列形成的唯一排序字符串来拆分行索引 1:nrow(df)
。排序可确保 A,B
和 B,A
得到相同的对待。
You could do something like this. It splits the row indices 1:nrow(df)
according to unique sorted strings formed from the columns of df. The sorting ensures that A,B
and B,A
are treated identically.
duplist <- split(1:nrow(df),apply(df,1,function(r) paste(sort(r),collapse=" ")))
duplist
$`alan edd`
[1] 2 6
$`alex david`
[1] 1
$`ben pete`
[1] 3 4
$`raymond richard`
[1] 5
这篇关于比较两列的组并返回索引匹配R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!