基于列名部分匹配的子集数据 [英] Subset data based on partial match of column names

查看:29
本文介绍了基于列名部分匹配的子集数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要对 df 进行子集化以包含某些字符串.其中一些是完整的列名,以下工作正常:

I need to subset a df to include certain strings. Some of these are full column names, and the following works fine:

testData[,c("FullColName1","FullColName2","FullColName3")]

我的问题是我需要将其扩展为还包括包含特定字符串的列名,这些特定字符串可能与某些其他列名部分匹配.这些字符串包括字母和符号:

My problem is that I need to expand this to also include column names that contain specific strings that may partially match to some other column names. These strings include letters and symbols:

"PartString1()","PartString2()"

我尝试在这些周围放置通配符.(我在下面用前缀星"表示这一点,因为*"符号没有正确呈现.)

I tried putting wildcards around these. (I've indicated this below with the prefix "star" because the "*" symbol didn't render correctly.)

testData[ ,c("FullColName1","FullColName2","FullColName3",
             "starPartString1()star","starPartString2()star")]

但我收到一条错误消息:选择了未定义的列.我不知道我是否或如何需要 grep 来完成这项工作.

But I'm getting an error message: undefined columns selected. I can't figure out if or how I need grep to make this work.

推荐答案

您提到您可能正在寻找符号,因此对于这个特定示例,我们可以使用 [[:punct:]] 作为我们的正则表达式.这将查找列名称中带有标点符号的所有字符串.

You mentioned you may be looking for symbols, so for this particular example we can use [[:punct:]] as our regular expression. This will find all the strings with punctuation symbols in the column names.

d <- data.frame(1:3, 3:1, 11:13, 13:11, rep(1, 3))
names(d) <- c("FullColName1", "FullColName2", "FullColName3",
              "PartString1()","PartString2()")

d[grepl("[[:punct:]]", names(d))]
#   PartString1() PartString2()
# 1            13             1
# 2            12             1
# 3            11             1

这最后一部分只是说明了使用 stringr

This last part just illustrates another way to do this with other string processing functions from stringr

library(stringr)
d[str_detect(names(d), "[[:punct:]]")]
#   PartString1() PartString2()
# 1            13             1
# 2            12             1
# 3            11             1

添加每个 OP 评论

d[grepl("ring[12()]", names(d))]

从名称向量中获取子串 ring1()ring2() 之一

to get either of the substrings ring1() or ring2() from the names vector

这篇关于基于列名部分匹配的子集数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆