在R中使用dplyr :: contains()和dplyr :: select()的正负子集 [英] Positive and negative subsetting using dplyr::contains() and dplyr::select() in R
问题描述
我正在尝试使用 dplyr :: select()
和dplyr :: contains()`的组合来实现正子集化,目标是子集化
I'm trying to achieve positive subsetting specifically using a combination of dplyr::select()
and dplyr::contains()`, with the goal being to subset by multiple string matches.
最小工作示例:以 df1
开始并进行负子集设置时,我生成 df2
如预期。相反,当尝试对 df1
进行正子集设置时,当我期望有某些结果时,会生成 df3
(无列)例如 df4
。谢谢你的帮助。
Minimal working example: when starting off with df1
and doing negative subsetting, I generate df2
as expected. In contrast, when attempting positive subsetting of df1
, I generate df3
(no columns) when I'd have expected something like df4
. Thanks for any help.
df1 <- data.frame("ppt_paint"=c(45,98,23),"het_heating"=c(1,1,2) ,"orm_wood"=c("QQ","OA","BB"), "hours"=c(4,6,4), "distance"=c(23,65,21))
df2 <- df1 %>% select(-contains("ppt_")) %>% select(-contains("het_")) %>% select(-contains("orm_"))
df3 <- df1 %>% select(contains("ppt_")) %>% select(contains("het_")) %>% select(contains("orm_"))
df4 <- data.frame("ppt_paint"=c(45,98,23),"het_heating"=c(1,1,2) ,"orm_wood"=c("QQ","OA","BB"))
推荐答案
思考(并查看生成的 data.frame
)在以下情况下会发生什么: df1%>%select(contains( ppt _))
。如所要求的,它仅保留 only 列,其中包含 ppt _
。进一步的表达式无法按您期望的那样工作,因为其他列(无论您用什么填充 select
)都不再存在。
Think (and have a look to the resulting data.frame
) to what happens after: df1 %>% select(contains("ppt_"))
. As asked, it only retains the only column that contains "ppt_"
. Further expressions cannot work as you expect since other columns, no matter what you're feeding select
with, are "no longer" there.
您可以保留相同的想法,但在同一选择
中的 combine 中使用三个键:
You can keep the same idea but combine in the same select
you three keys:
df1 %>% select(matches("ppt_"), matches("het_"), matches("orm_"))
ppt_paint het_heating orm_wood
1 45 1 QQ
2 98 1 OA
3 23 2 BB
或者,您可以使用匹配项
来实现,该匹配项接受正则表达式:
Alternatively, you can do it with matches
, that accepts regular expressions:
df1 %>% select(matches(c("ppt_|het_|orm_")))
ppt_paint het_heating orm_wood
1 45 1 QQ
2 98 1 OA
3 23 2 BB
而且,您也可以使用它来缩短负数 索引:
And by the way you can also use it to shorten your "negative" indexing:
df1 %>% select(-matches("ppt_|het_|orm_"))
hours distance
1 4 23
2 6 65
3 4 21
这篇关于在R中使用dplyr :: contains()和dplyr :: select()的正负子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!