dplyr中select()的contains()和matchs()之间的区别 [英] Difference between contains() and matches() for select() in dplyr
问题描述
我决定花一些时间彻底学习dplyr。我刚刚遇到了 select()
函数和它附带的一些辅助函数。
I have decided to spend some time to learn dplyr thoroughly. I have just come across the select()
function and some of the helper functions that come with it.
只是玩耍而已,我没有发现包含
和匹配之间的任何区别
辅助功能。
By just playing around I have failed to find any difference between the contains
and matches
helper functions.
有人可以提供一个示例说明如何将它们用于不同的目的吗?
Could someone please provide an example of how they can be used for different purposes?
谢谢
推荐答案
区别是匹配
可以使用正则表达式作为模式来匹配列名,并 select
来匹配,而包含
可以实现子字符串或全名的字面匹配比赛。在?select_helpers
中描述为
The difference is that matches
can take regex as pattern to match column names and select
while contains
does the literal match of substring or full name match. It is described in the ?select_helpers
as
contains():包含文字字符串。
contains(): Contains a literal string.
matches():匹配正则表达式。
matches(): Matches a regular expression.
考虑一个简单的示例,我们要选择具有子字符串的列'col'
Consider a simple example where we want to select columns that have substring 'col'
df1 <- data.frame(colnm = 1:5, col1 = 24, col2 = 46)
df1 %>%
select(contains("col"))
# colnm col1 col2
#1 1 24 46
#2 2 24 46
#3 3 24 46
#4 4 24 46
#5 5 24 46
这里匹配在列名称中按字面上的 col,然后选择那些。如果我们更改匹配条件以匹配'col',后跟一个或多个数字( \\d +
)与正则表达式
Here, it matches the 'col' literally in the column names and select those. If we change the matching criteria to match 'col' followed by one or more digits (\\d+
) with a regex
df1 %>%
select(contains("col\\d+"))
#data frame with 0 columns and 5 rows
如果失败,因为它正在寻找列名子字符串" col\\d + "
if fails, because it is looking for column name substring "col\\d+"
df1 %>%
select(matches("col\\d+"))
# col1 col2
#1 24 46
#2 24 46
#3 24 46
#4 24 46
#5 24 46
而个匹配项
使用 regex
并匹配这些模式
这篇关于dplyr中select()的contains()和matchs()之间的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!