在R中使用dplyr :: contains()和dplyr :: select()的正负子集 [英] Positive and negative subsetting using dplyr::contains() and dplyr::select() in R

查看:658
本文介绍了在R中使用dplyr :: contains()和dplyr :: select()的正负子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 dplyr :: select()和dplyr :: contains()`的组合来实现正子集化,目标是子集化

I'm trying to achieve positive subsetting specifically using a combination of dplyr::select() and dplyr::contains()`, with the goal being to subset by multiple string matches.

最小工作示例:以 df1 开始并进行负子集设置时,我生成 df2 如预期。相反,当尝试对 df1 进行正子集设置时,当我期望有某些结果时,会生成 df3 (无列)例如 df4 。谢谢你的帮助。

Minimal working example: when starting off with df1 and doing negative subsetting, I generate df2 as expected. In contrast, when attempting positive subsetting of df1, I generate df3 (no columns) when I'd have expected something like df4. Thanks for any help.

df1 <- data.frame("ppt_paint"=c(45,98,23),"het_heating"=c(1,1,2) ,"orm_wood"=c("QQ","OA","BB"), "hours"=c(4,6,4), "distance"=c(23,65,21))
df2 <- df1 %>% select(-contains("ppt_")) %>% select(-contains("het_")) %>% select(-contains("orm_"))
df3 <- df1 %>% select(contains("ppt_")) %>% select(contains("het_")) %>% select(contains("orm_")) 
df4 <- data.frame("ppt_paint"=c(45,98,23),"het_heating"=c(1,1,2) ,"orm_wood"=c("QQ","OA","BB"))


推荐答案

思考(并查看生成的 data.frame )在以下情况下会发生什么: df1%>%select(contains( ppt _))。如所要求的,它仅保留 only 列,其中包含 ppt _ 。进一步的表达式无法按您期望的那样工作,因为其他列(无论您用什么填充 select )都不再存在。

Think (and have a look to the resulting data.frame) to what happens after: df1 %>% select(contains("ppt_")). As asked, it only retains the only column that contains "ppt_". Further expressions cannot work as you expect since other columns, no matter what you're feeding select with, are "no longer" there.

您可以保留相同的想法,但在同一选择中的 combine 中使用三个键:

You can keep the same idea but combine in the same select you three keys:

df1 %>% select(matches("ppt_"), matches("het_"), matches("orm_"))
  ppt_paint het_heating orm_wood
1        45           1       QQ
2        98           1       OA
3        23           2       BB

或者,您可以使用匹配项来实现,该匹配项接受正则表达式:

Alternatively, you can do it with matches, that accepts regular expressions:

df1 %>% select(matches(c("ppt_|het_|orm_")))
  ppt_paint het_heating orm_wood
1        45           1       QQ
2        98           1       OA
3        23           2       BB

而且,您也可以使用它来缩短负数 索引:

And by the way you can also use it to shorten your "negative" indexing:

df1 %>% select(-matches("ppt_|het_|orm_"))
  hours distance
1     4       23
2     6       65
3     4       21

这篇关于在R中使用dplyr :: contains()和dplyr :: select()的正负子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆