R从数据框中选择不出现在另一个的所有行 [英] R selecting all rows from a data frame that don't appear in another

查看:817
本文介绍了R从数据框中选择不出现在另一个的所有行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解决一个棘手的R问题,我无法通过Google搜索关键字解决问题。具体来说,我试图取一个数据帧,其值不会出现在另一个数据帧中。以下是一个例子:

 >测试
数字水果ID1 ID2
item1number1apples2233
item2number2oranges1333
item3number3 桃子4425
item4number4apples1213
> test2
number fruit ID1 ID2
item1number1papayas2233
item2number2oranges1333
item3number3 桃子44125
item4number4apples12313
item5number3桃子4425
item6 number4apples1213
item7number1apples2233

我有两个数据帧,test和test2,目标是选择test2中没有出现在测试中的所有行,即使某些值可能相同。



我想要的输出如下所示:

  item1number1papayas 2233
item2number3桃子44125
item3number4apples12313

可能有任意数量的行或列,但在具体情况下,一个数据帧是另一个数据帧的直接子集。 p>

我已经使用了R子集(),merge()和which()函数,但co不知道如何使用这些组合,如果可能的话,得到我想要的。



编辑:这是用于生成这两个表的R代码。

  test<  -  data.frame(c(number1,apples,22,33),c (number2,orange,13,33),
c(number3,桃子,44,25),c(number4,apples,12,13))

test< - t(test)
rownames(test)= c(item1,item2,item3,item4)
colnames(test)= c数字,水果,ID1,ID2)

test2< - data.frame(data.frame(c(number1,papayas c(number2,orange,13,33),
c(number3,桃,441,25),c(number4,apples,123,13) number3,桃子,44,25),c(number4,apples,12,13)))

test2< - t(test2)
rownames (test2)= c(item1,item2,item3,item4,item5,item6)
colnames(test2)= c(number ID1,ID2)

提前感谢

解决方案

这是另一种方式:

  x<  -  rbind ,test)
x [!重复(x,fromLast = TRUE)& seq(nrow(x))< = nrow(test2),]
#number fruit ID1 ID2
#item1 number1 papayas 22 33
#item3 number3 peaches 441 25
# item4 number4 apples 123 13

修改:修改为保留行名。 / p>

I'm trying to solve a tricky R problem that I haven't been able to solve via Googling keywords. Specifically, I'm trying to take a subset one data frame whose values don't appear in another. Here is an example:

> test
      number    fruit     ID1  ID2 
item1 "number1" "apples"  "22" "33"
item2 "number2" "oranges" "13" "33"
item3 "number3" "peaches" "44" "25"
item4 "number4" "apples"  "12" "13"
> test2
      number    fruit     ID1   ID2 
item1 "number1" "papayas" "22"  "33"
item2 "number2" "oranges" "13"  "33"
item3 "number3" "peaches" "441" "25"
item4 "number4" "apples"  "123" "13"
item5 "number3" "peaches" "44"  "25"
item6 "number4" "apples"  "12"  "13"
item7 "number1" "apples"  "22"  "33"

I have two data frames, test and test2, and the goal is to select all entire rows in test2 that don't appear in test, even though some of the values may be the same.

The output I want would look like:

item1 "number1" "papayas" "22"  "33"
item2 "number3" "peaches" "441" "25"
item3 "number4" "apples"  "123" "13"

There may be an arbitrary amount of rows or columns, but in my specific case, one data frame is a direct subset of the other.

I've used the R subset(), merge() and which() functions extensively, but couldn't figure out how to use these in combination, if it's possible at all, to get what I want.

edit: Here is the R code I used to generate these two tables.

test <- data.frame(c("number1", "apples", 22, 33), c("number2", "oranges", 13, 33),
    c("number3", "peaches", 44, 25), c("number4", "apples", 12, 13))

test <- t(test)
rownames(test) = c("item1", "item2", "item3", "item4")
colnames(test) = c("number", "fruit", "ID1", "ID2")

test2 <- data.frame(data.frame(c("number1", "papayas", 22, 33), c("number2", "oranges", 13, 33),
    c("number3", "peaches", 441, 25), c("number4", "apples", 123, 13),c("number3", "peaches", 44, 25), c("number4", "apples", 12, 13)  ))

test2 <- t(test2)
rownames(test2) = c("item1", "item2", "item3", "item4", "item5", "item6")
colnames(test2) = c("number", "fruit", "ID1", "ID2")

Thanks in advance!

解决方案

Here's another way:

x <- rbind(test2, test)
x[! duplicated(x, fromLast=TRUE) & seq(nrow(x)) <= nrow(test2), ]
#        number   fruit ID1 ID2
# item1 number1 papayas  22  33
# item3 number3 peaches 441  25
# item4 number4  apples 123  13

Edit: modified to preserve row names.

这篇关于R从数据框中选择不出现在另一个的所有行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆