R根据应用于多个列的多个部分字符串过滤行 [英] R filter rows based on multiple partial strings applied to multiple columns

查看:42
本文介绍了R根据应用于多个列的多个部分字符串过滤行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

数据集样本:

diag01 <- as.factor(c("S7211","J47","J47","K729","M2445","Z509","Z488","R13","L893","N318","L0311","S510","A047","D649"))
diag02 <- as.factor(c("K590","D761","J961","T501","M8580","R268","T831","G8240","B9688","G550","E162","T8902","E86","I849"))
diag03 <- as.factor(c("F058","M0820","E877","E86","G712","R32","A408","E888","G8220","C794","T68","L0310","M1094","D469"))
diag04 <- as.factor(c("E86","C845","R790","I420","G4732","R600","L893","R509","T913","C795","M8412","G8212","L891","L0311"))
diag05 <- as.factor(c("R001","N289","E876","E871","H659","R4589","N508","B99","I209","C773","T921","Q070","H919","L033"))
diag06 <- as.factor(c("I951","E877","S7240","I500","H901","E119","Z223","K590","I959","C509","G819","F719","Z290","R13"))

df <- data.frame(diag01, diag02, diag03, diag04, diag05, diag06)

我想过滤在给定列列表中的任何地方都具有部分字符串匹配的整个行(例如diag01,diag02等).我可以在一个专栏上做到这一点,例如

I want to filter the entire rows that have a partial string match anywhere in a given list of columns (e.g. diag01, diag02, ...). I can achieve this on a single column e.g.

junk <- filter(df, grepl(pattern="^E11|^E16|^E86|^E87|^E88", diag02))

但是我需要将其应用于多列(原始数据集具有216列和> 1,000,000行).在其他选项中,我尝试过

but I need to apply this to multiple columns (the original dataset has 216 columns and >1,000,000 rows). Among other options, I have tried

junk <- filter(df, grepl(pattern="^E11|^E16|^E86|^E87|^E88", df[,c(1:6)]))
junk <- apply(df, 1, function(r) any(r %in% grepl(pattern="^E11|^E16|^E86|^E87|^E88")))

我需要整个行,并且理想情况下,我希望将过滤条件限制为给定的列列表,因为其他列中的值可能以声明的部分字符串开头.

I need the entire row and ideally I would like the filtering criteria to be restricted to a given list of columns as it is likely values in other columns may begin with the declared partial strings.

为寻找解决方案做出了真正的努力,但显然我缺乏R的知识.

Made a genuine effort to search for a solution but obviously my knowledge of R is lacking.

推荐答案

也许我们需要

df %>%
   filter_all(any_vars(grepl(pattern="^(E11|E16|E86|E87|E88)", .)))


或使用 purrr dplyr

library(dplyr)
library(purrr)
df %>%
   map(~grepl(pattern="^E11|^E16|^E86|^E87|^E88", .)) %>% 
   reduce(`|`) %>%
   df[.,]

这篇关于R根据应用于多个列的多个部分字符串过滤行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆