根据另一个数据帧中的列在一个数据帧中应用正则表达式 [英] apply regexp in one data frame based on the column in another data frame
问题描述
我有两个数据帧---表A是模式表,表B是名称表.我想对表B进行子集化,使其与表a中的模式匹配.
I have two data frames --- table A is the pattern table, and table B is the name table. I want to subset table B, where it matches the pattern in table a.
A <- data.frame(pattern = c("aa", "bb", "cc", "dd"))
B <- data.frame(name = "aa1", "bb1", "abc", "def" ,"ddd")
我试图做一个for循环,看起来像:
I'm trying to do a for loop looks like:
for (i in 1:nrow(A)){
for (j in 1:nrow(B)){
DT <- data.frame(grep(A$pattern[i], B$name[j], ignore.case = T, value = T))
}}
我希望我的结果表DT
仅包含aa1
,bb1
和ddd
And I want my resulting table DT
to only contains aa1
, bb1
, and ddd
但这太慢了.我只是想知道是否还有更有效的方法?多谢!
But it's super slow. I just wondering if there's any more efficient way to do it? Many thans!
推荐答案
在您的示例输入数据中似乎有一个小错误(未正确声明缺少B$name
,并且两个data.frame
对象都需要包含stringsAsFactors = F
):
it appears there's a slight error in your sample input data (missing B$name
is not properly declared and need to include stringsAsFactors = F
for both data.frame
objects):
> A <- data.frame(pattern = c("aa", "bb", "cc", "dd"), stringsAsFactors = F)
> B <- data.frame(name = c("aa1", "bb1", "abc", "def" ,"ddd"), stringsAsFactors = F)
代码
# using sapply with grepl
> indices <- sapply(1:nrow(A), function(z) grepl(A$pattern[z], B$name[z]))
> indices
[1] TRUE TRUE FALSE FALSE
> B[indices, ]
[1] "aa1" "bb1" "ddd"
这篇关于根据另一个数据帧中的列在一个数据帧中应用正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!