根据另一个数据帧中的列在一个数据帧中应用正则表达式 [英] apply regexp in one data frame based on the column in another data frame

查看:67
本文介绍了根据另一个数据帧中的列在一个数据帧中应用正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据帧---表A是模式表,表B是名称表.我想对表B进行子集化,使其与表a中的模式匹配.

I have two data frames --- table A is the pattern table, and table B is the name table. I want to subset table B, where it matches the pattern in table a.

A <- data.frame(pattern = c("aa", "bb", "cc", "dd"))
B <- data.frame(name = "aa1", "bb1", "abc", "def" ,"ddd")

我试图做一个for循环,看起来像:

I'm trying to do a for loop looks like:

for (i in 1:nrow(A)){
for (j in 1:nrow(B)){
DT <- data.frame(grep(A$pattern[i], B$name[j], ignore.case = T, value = T))
}}

我希望我的结果表DT仅包含aa1bb1ddd

And I want my resulting table DTto only contains aa1, bb1, and ddd

但这太慢了.我只是想知道是否还有更有效的方法?多谢!

But it's super slow. I just wondering if there's any more efficient way to do it? Many thans!

推荐答案

在您的示例输入数据中似乎有一个小错误(未正确声明缺少B$name,并且两个data.frame对象都需要包含stringsAsFactors = F ):

it appears there's a slight error in your sample input data (missing B$name is not properly declared and need to include stringsAsFactors = F for both data.frame objects):

> A <- data.frame(pattern = c("aa", "bb", "cc", "dd"), stringsAsFactors = F)
> B <- data.frame(name = c("aa1", "bb1", "abc", "def" ,"ddd"), stringsAsFactors = F)

代码

# using sapply with grepl
> indices <- sapply(1:nrow(A), function(z) grepl(A$pattern[z], B$name[z]))
> indices
[1]  TRUE  TRUE FALSE FALSE

> B[indices, ]
[1] "aa1" "bb1" "ddd"

这篇关于根据另一个数据帧中的列在一个数据帧中应用正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆