Dplyr根据一列中的多个字符串进行选择 [英] Dplyr select based on multiple strings in a column
问题描述
我有一个包含以下列的数据框:-
sample.data
a_b_c d_b_e r_f_g c_b_a
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
如何仅选择在列名中同时包含 a和 c的列?
选择包含 a
和 c $ c的变量$ c>我们可以做到:
库(dplyr)
df%&%;%
select(matches((a。* c)|(c。* a)))
< blockquote>
a_b_c c_b_a
1 1 1
2 2 2
3 3 3
4 4 4
请注意,var a_a_e $ c $未选择c>,因为它不包含
。 c
;未选择var c_f_g
,因为它不包含 a
。与var <$ c相同,不会选择带有两个 a
和两个 c
的列名$ c> a_a_e
我们还可以使用 str_subset
:
library(dplyr)
库(stringr)
df%>%
select(str_subset(names(df),(a。 * c)|(c。* a)))
数据:
df<-data.frame(
a_b_c = 1:4,
a_a_e = 1:4,
c_f_g = 1:4,
c_b_a = 1:4
)
I have a data frame containing following columns:-
sample.data
a_b_c d_b_e r_f_g c_b_a
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
How do I select only columns that contain both let's say "a" and "c" in the column name?
To select variables that contain a
and c
we could do:
library(dplyr)
df %>%
select(matches("(a.*c)|(c.*a)"))
a_b_c c_b_a 1 1 1 2 2 2 3 3 3 4 4 4
Note that var a_a_e
is not selected because it doesn't contain c
and var c_f_g
is not selected because it doesn't contain a
. Column names with two a
's and two c
's will not be selected either as seen with var a_a_e
.
We could also use str_subset
:
library(dplyr)
library(stringr)
df %>%
select(str_subset(names(df), "(a.*c)|(c.*a)"))
Data:
df <- data.frame(
a_b_c = 1:4,
a_a_e = 1:4,
c_f_g = 1:4,
c_b_a = 1:4
)
这篇关于Dplyr根据一列中的多个字符串进行选择的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!