检查数据帧的每一行是否包含在另一个数据帧中 [英] Check if each row of a data frame is contained in another data frame
问题描述
df1
具有1700行时,非常慢,而 df2
具有70000行。有没有提高效率? rowcheck< - function(df1,df2){
apply(df1 ,1,function(x)any(apply(df2,1,function(y)all(y == x))))
}
我写的这个函数应用到的一个例子是:我想检查df1中的每一行是否包含在df2中的一行:
df1 = data.frame(a = c(1:3),b = c(a,b,c))
/ pre>
df2 = data.frame(a = c(1:6),b = rep(c(a,b,c),2))
对于df1的每一行,我想检查它是否包含在df2中的一行。我想将函数返回为长度为nrow(df1)的逻辑向量。
感谢您的帮助。
解决方案一种方法是将行粘贴在一起,并以这种方式进行比较。另外,为了效率,最好分开
应用
调用。结果是根据您的要求,长度为nrow(df1)
的逻辑向量。code>> rowcheck< - function(df1,df2){
xx< - apply(df1,1,paste,collapse =)
yy< - apply(df2,1, )
zz < - xx%in%yy
return(zz)
}
> rowcheck(df1,df2)
## [1] TRUE TRUE TRUE
> rowcheck(df2,df1)
## [1] TRUE TRUE TRUE FALSE FALSE
I wrote the following function, it works. However it is very slow when
df1
has 1700 rows, anddf2
has 70000 rows. Is there anyway to improve the efficiency?rowcheck <- function(df1, df2){ apply(df1, 1, function(x) any(apply(df2, 1, function(y) all(y==x)))) }
An example I wrote this function to apply to is: I want to check whether each row in df1 is contained as a row in df2:
df1=data.frame(a=c(1:3),b=c("a","b","c")) df2=data.frame(a=c(1:6),b=rep(c("a","b","c"),2))
For each row of df1, I want to check if it is contained as a row in df2. I want to return of the function to be a logical vector of length nrow(df1).
Thank you for your help.
解决方案One way is to paste the rows together, and compare them that way. Also, for efficiency, it's best to separate the
apply
calls. The result is a logical vector the length ofnrow(df1)
, as you requested.> rowcheck <- function(df1, df2){ xx <- apply(df1, 1, paste, collapse = "") yy <- apply(df2, 1, paste, collapse = "") zz <- xx %in% yy return(zz) } > rowcheck(df1, df2) ## [1] TRUE TRUE TRUE > rowcheck(df2, df1) ## [1] TRUE TRUE TRUE FALSE FALSE FALSE
这篇关于检查数据帧的每一行是否包含在另一个数据帧中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!