如何使用R中的条件删除重复的行 [英] How to remove duplicate rows in both using a condition in R

查看:657
本文介绍了如何使用R中的条件删除重复的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我拥有的数据类似于:

  RES1<  -  c(A,B A,A,B)
RES2 < - c(B,A,A,B,A)
VAL1 < c(3,5,3,6,8)
VAL2 <-c(5,3,7,2,7)
dff dff
RES1 VAL1 RES2 VAL2
1 A 3 B 5
2 B 5 A 3
3 A 3 A 7
4 A 6 B 2
5 B 8 A 7

我想删除我已经拥有的相同res1-res2对。例如:A 3与B 5交互。这是我想要的信息。我不在乎哪对是第一。 B 5与A 3或A 3与B 5.我想得到的是以下数据框:

 输出
RES1 VAL1 RES2 VAL2
1 A 3 B 5
2 A 3 A 7
3 A 6 B 2
4 B 8 A 7



然后我想对另一个数据框做同样的操作:

  RES3 < -  c(B,B,B,A,B)
RES4 < - c ,A,A,B)
VAL4 < - c(3,7,5,3,8)
VAL3 < - c(5,8,3, 7,3)
df2 < - data.frame(RES3,VAL3,RES4,VAL4)

df2
RES3 VAL3 RES4 VAL4
1 B 5 A 3
2 B 8 A 7
3 B 3 A 5
4 A 7 A 3
5 B 3 B 8


最终输出我希望有以下对是唯一的并且存在于BOTH数据框架中:

  mutualpairs 
RESA VALA RESB VALB
A 3 B 5
A 3 A 7
B 8 A 7


解决方案

您可以使用以下代码:

  dff [!重复(t(apply(cbind(paste(dff $ RES1,dff $ VAL1),paste(dff $ RES2,dff $ VAL2)), ,] 

等效展开代码:

  v1 < -  paste(dff $ RES1,dff $ VAL1)
v2< - paste(dff $ RES2,dff $ VAL2)
mx< - cbind (v1,v2)
mxSorted< - t(apply(mx,1,sort))
duped< - duplicated(mxSorted)
dff [!duped,]

说明:



1)我们创建两个字符向量<通过连接列RES1-VAL1和RES2-VAL2(请注意粘贴),code> v1 v2 code>使用空格作为默认分隔符,也许你可以使用另一个字符或字符串来更安全(例如 | @ ; 等...) br>
结果:

 > v1 
[1]A 3B 5A 3A 6B 8
& v2
[1]B 5A 3A 7B 2A 7

2)使用 cbind ;

结合这两个向量以形成一个矩阵结果:

  [,1] [,2] 
[1,]A 3B 5
[2, B 5A 3
[3,]A 3A 7
[4,]A 6B 2
[5, A 7

3)我们使用 t(apply(mx,1,sort));

通过对行进行排序,我们简单地使具有相同值的行刚刚交换转置是必要的,因为 apply 函数总是返回列上的结果)。

结果:

  [,1] [,2] 
[1,]A 3B 5
[2,
[3,]A 3A 7
[4,]A 6B 2
[5,]A 7B 8

4)在矩阵上调用重复一个长度= nrow(矩阵)的逻辑向量,为TRUE,其中行是前一行的副本,因此在我们的例子中,我们得到:

  [1] FALSE TRUE FALSE FALSE FALSE 
#ie第二行是重复的

5)最后我们使用这个向量来过滤data.frame的行,得到最终结果:

  RES1 VAL1 RES2 VAL2 
1 A 3 B 5
3 A 3 A 7
4 A 6 B 2
5 B 8 A 7


The data I have is something like that:

RES1 <- c("A","B","A","A","B")
RES2 <- c("B","A","A","B","A")
VAL1 <-c(3,5,3,6,8)
VAL2 <- c(5,3,7,2,7)
dff <- data.frame(RES1,VAL1,RES2,VAL2)
dff
  RES1 VAL1 RES2 VAL2
  1    A    3    B    5 
  2    B    5    A    3
  3    A    3    A    7
  4    A    6    B    2
  5    B    8    A    7

I want to remove the lines where I already have the same res1-res2 pair. For example: A 3 interacts with B 5. That's the information I want. I do not care which pair is first. B 5 with A 3 or A 3 with B 5. What I want to get is the following dataframe:

output
  RES1 VAL1 RES2 VAL2
   1    A    3    B    5
   2    A    3    A    7
   3    A    6    B    2
   4    B    8    A    7

Then I want to do the same for another data frame such as :

RES3 <- c("B","B","B","A","B")
RES4 <- c("A","A","A","A","B")
VAL4 <- c(3,7,5,3,8)
VAL3 <- c(5,8,3,7,3)
df2 <- data.frame(RES3,VAL3,RES4,VAL4)

df2
  RES3 VAL3 RES4 VAL4
   1     B     5     A     3
   2     B     8     A     7
   3     B     3     A     5
   4     A     7     A     3
   5     B     3     B     8

At the end, I just want to keep mutual pairs (in my definition both pairs are the same, keeping one is essential : "A 5" - "B 3" is the same as "B 3" - "A 5". In other words, order does not matter.

Final output I desire should have the following pairs which are unique and which exist in BOTH dataframes:

mutualpairs
  RESA VALA RESB VALB
  A     3     B     5
  A     3     A     7
  B     8     A     7

解决方案

You can use this code:

dff[!duplicated(t(apply(cbind(paste(dff$RES1,dff$VAL1),paste(dff$RES2,dff$VAL2)),1,sort))),]

Equivalent unrolled code:

v1 <- paste(dff$RES1,dff$VAL1)
v2 <- paste(dff$RES2,dff$VAL2)
mx <- cbind(v1,v2)
mxSorted <- t(apply(mx,1,sort))
duped <- duplicated(mxSorted)
dff[!duped,]

Explanation:

1) we create two character vectors v1, v2 by concatenating columns RES1-VAL1 and RES2-VAL2 (note that paste uses a space as default separator, maybe you could use another character or string to be safer (e.g. |,@,; etc...)
Result:

> v1
[1] "A 3" "B 5" "A 3" "A 6" "B 8"
> v2
[1] "B 5" "A 3" "A 7" "B 2" "A 7"

2) we bind these two vectors to form a matrix using cbind;
Result:

     [,1]  [,2] 
[1,] "A 3" "B 5"
[2,] "B 5" "A 3"
[3,] "A 3" "A 7"
[4,] "A 6" "B 2"
[5,] "B 8" "A 7"

3) we sort the values of each row of the matrix using t(apply(mx,1,sort));
by sorting the rows, we simply make identical the rows having the same values just swapped (note that final transpose is necessary since apply function always returns results on the columns).
Result:

     [,1]  [,2] 
[1,] "A 3" "B 5"
[2,] "A 3" "B 5"
[3,] "A 3" "A 7"
[4,] "A 6" "B 2"
[5,] "A 7" "B 8"

4) calling duplicated on a matrix, we get a logical vector of length = nrow(matrix), being TRUE where a row is a duplicate of a previous row, so in our case, we get:

[1] FALSE  TRUE FALSE FALSE FALSE
# i.e. the second row is a duplicate

5) finally we use this vector to filter the rows of the data.frame, getting the final result:

  RES1 VAL1 RES2 VAL2
1    A    3    B    5
3    A    3    A    7
4    A    6    B    2
5    B    8    A    7

这篇关于如何使用R中的条件删除重复的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆