如何使用R中的条件删除重复的行 [英] How to remove duplicate rows in both using a condition in R
问题描述
我拥有的数据类似于:
RES1< - c(A,B A,A,B)
RES2 < - c(B,A,A,B,A)
VAL1 < c(3,5,3,6,8)
VAL2 <-c(5,3,7,2,7)
dff dff
RES1 VAL1 RES2 VAL2
1 A 3 B 5
2 B 5 A 3
3 A 3 A 7
4 A 6 B 2
5 B 8 A 7
我想删除我已经拥有的相同res1-res2对。例如:A 3与B 5交互。这是我想要的信息。我不在乎哪对是第一。 B 5与A 3或A 3与B 5.我想得到的是以下数据框:
输出
RES1 VAL1 RES2 VAL2
1 A 3 B 5
2 A 3 A 7
3 A 6 B 2
4 B 8 A 7
然后我想对另一个数据框做同样的操作:
RES3 < - c(B,B,B,A,B)
$最后,我只想保持双对(在我的定义中,两个对是相同的,保持一个是必要的:A 5 - B 3与B 3 - A 5相同。换句话说,顺序无关紧要。
RES4 < - c ,A,A,B)
VAL4 < - c(3,7,5,3,8)
VAL3 < - c(5,8,3, 7,3)
df2 < - data.frame(RES3,VAL3,RES4,VAL4)
df2
RES3 VAL3 RES4 VAL4
1 B 5 A 3
2 B 8 A 7
3 B 3 A 5
4 A 7 A 3
5 B 3 B 8
最终输出我希望有以下对是唯一的并且存在于BOTH数据框架中:
mutualpairs
RESA VALA RESB VALB
A 3 B 5
A 3 A 7
B 8 A 7
解决方案您可以使用以下代码:
dff [!重复(t(apply(cbind(paste(dff $ RES1,dff $ VAL1),paste(dff $ RES2,dff $ VAL2)), ,]
等效展开代码:
v1 < - paste(dff $ RES1,dff $ VAL1)
v2< - paste(dff $ RES2,dff $ VAL2)
mx< - cbind (v1,v2)
mxSorted< - t(apply(mx,1,sort))
duped< - duplicated(mxSorted)
dff [!duped,]
说明:
1)我们创建两个字符向量<通过连接列RES1-VAL1和RES2-VAL2(请注意
粘贴),code> v1
,v2
code>使用空格作为默认分隔符,也许你可以使用另一个字符或字符串来更安全(例如|
,@
,;
等...) br>
结果:> v1
[1]A 3B 5A 3A 6B 8
& v2
[1]B 5A 3A 7B 2A 7
2)使用
cbind
;
结合这两个向量以形成一个矩阵结果:[,1] [,2]
[1,]A 3B 5
[2, B 5A 3
[3,]A 3A 7
[4,]A 6B 2
[5, A 7
3)我们使用
t(apply(mx,1,sort))
;
通过对行进行排序,我们简单地使具有相同值的行刚刚交换转置是必要的,因为apply
函数总是返回列上的结果)。
结果:[,1] [,2]
[1,]A 3B 5
[2,
[3,]A 3A 7
[4,]A 6B 2
[5,]A 7B 8
4)在矩阵上调用
重复
一个长度= nrow(矩阵)的逻辑向量,为TRUE,其中行是前一行的副本,因此在我们的例子中,我们得到:[1] FALSE TRUE FALSE FALSE FALSE
#ie第二行是重复的
5)最后我们使用这个向量来过滤data.frame的行,得到最终结果:
RES1 VAL1 RES2 VAL2
1 A 3 B 5
3 A 3 A 7
4 A 6 B 2
5 B 8 A 7
The data I have is something like that:
RES1 <- c("A","B","A","A","B") RES2 <- c("B","A","A","B","A") VAL1 <-c(3,5,3,6,8) VAL2 <- c(5,3,7,2,7) dff <- data.frame(RES1,VAL1,RES2,VAL2) dff RES1 VAL1 RES2 VAL2 1 A 3 B 5 2 B 5 A 3 3 A 3 A 7 4 A 6 B 2 5 B 8 A 7
I want to remove the lines where I already have the same res1-res2 pair. For example: A 3 interacts with B 5. That's the information I want. I do not care which pair is first. B 5 with A 3 or A 3 with B 5. What I want to get is the following dataframe:
output RES1 VAL1 RES2 VAL2 1 A 3 B 5 2 A 3 A 7 3 A 6 B 2 4 B 8 A 7
Then I want to do the same for another data frame such as :
RES3 <- c("B","B","B","A","B") RES4 <- c("A","A","A","A","B") VAL4 <- c(3,7,5,3,8) VAL3 <- c(5,8,3,7,3) df2 <- data.frame(RES3,VAL3,RES4,VAL4) df2 RES3 VAL3 RES4 VAL4 1 B 5 A 3 2 B 8 A 7 3 B 3 A 5 4 A 7 A 3 5 B 3 B 8
At the end, I just want to keep mutual pairs (in my definition both pairs are the same, keeping one is essential : "A 5" - "B 3" is the same as "B 3" - "A 5". In other words, order does not matter.
Final output I desire should have the following pairs which are unique and which exist in BOTH dataframes:
mutualpairs RESA VALA RESB VALB A 3 B 5 A 3 A 7 B 8 A 7
解决方案You can use this code:
dff[!duplicated(t(apply(cbind(paste(dff$RES1,dff$VAL1),paste(dff$RES2,dff$VAL2)),1,sort))),]
Equivalent unrolled code:
v1 <- paste(dff$RES1,dff$VAL1) v2 <- paste(dff$RES2,dff$VAL2) mx <- cbind(v1,v2) mxSorted <- t(apply(mx,1,sort)) duped <- duplicated(mxSorted) dff[!duped,]
Explanation:
1) we create two character vectors
v1
,v2
by concatenating columns RES1-VAL1 and RES2-VAL2 (note thatpaste
uses a space as default separator, maybe you could use another character or string to be safer (e.g.|
,@
,;
etc...)
Result:> v1 [1] "A 3" "B 5" "A 3" "A 6" "B 8" > v2 [1] "B 5" "A 3" "A 7" "B 2" "A 7"
2) we bind these two vectors to form a matrix using
cbind
;
Result:[,1] [,2] [1,] "A 3" "B 5" [2,] "B 5" "A 3" [3,] "A 3" "A 7" [4,] "A 6" "B 2" [5,] "B 8" "A 7"
3) we sort the values of each row of the matrix using
t(apply(mx,1,sort))
;
by sorting the rows, we simply make identical the rows having the same values just swapped (note that final transpose is necessary sinceapply
function always returns results on the columns).
Result:[,1] [,2] [1,] "A 3" "B 5" [2,] "A 3" "B 5" [3,] "A 3" "A 7" [4,] "A 6" "B 2" [5,] "A 7" "B 8"
4) calling
duplicated
on a matrix, we get a logical vector of length = nrow(matrix), being TRUE where a row is a duplicate of a previous row, so in our case, we get:[1] FALSE TRUE FALSE FALSE FALSE # i.e. the second row is a duplicate
5) finally we use this vector to filter the rows of the data.frame, getting the final result:
RES1 VAL1 RES2 VAL2 1 A 3 B 5 3 A 3 A 7 4 A 6 B 2 5 B 8 A 7
这篇关于如何使用R中的条件删除重复的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!