防止R中的grep处理“。”。作为一封信 [英] Prevent grep in R from treating "." as a letter
问题描述
我有一个包含类似以下内容的字符向量:
text < - c .xYz,ge,lmo.qrstu)
a 。
:
xyzgeqrstu
但是, grep
函数似乎以
$$c> pattern< - ([AZ] | [az])+ $
grep(pattern,text,value = T)
& ABc.def.xYzgelmo.qrstu
如 regexpal 。
如何获得
grep
的行为如预期?解决方案
grep
用于查找模式。它返回与模式匹配的向量的索引。如果指定value = TRUE
,则返回值。从描述中,似乎要删除子字符串,而不是返回初始向量的子集。
如果需要删除子字符串,可以使用
sub
sub('。* \\。','' ,text)
#[1]xYzgeqrstu
第一个参数,我们匹配一个模式ie
'。* \\。'
。它匹配多个字符(。*
)后跟一个点(\\。
)。需要\\
来转义。
,将其视为符号而不是任何字符。这将匹配,直到字符串中最后的。
字符。我们用''
替换匹配模式作为替换参数,从而删除子字符串。I have a character vector that contains text similar to the following:
text <- c("ABc.def.xYz", "ge", "lmo.qrstu")
I would like to remove everything before a
.
:> "xYz" "ge" "qrstu"
However, the
grep
function seems to be treating.
as a letter:pattern <- "([A-Z]|[a-z])+$" grep(pattern, text, value = T) > "ABc.def.xYz" "ge" "lmo.qrstu"
The pattern works elsewhere, such as on regexpal.
How can I get
grep
to behave as expected?解决方案
grep
is for finding the pattern. It returns the index of the vector that matches a pattern. If,value=TRUE
is specified, it returns the value. From the description, it seems that you want to remove substring instead of returning a subset of the initial vector.If you need to remove the substring, you can use
sub
sub('.*\\.', '', text) #[1] "xYz" "ge" "qrstu"
As the first argument, we match a pattern i.e.
'.*\\.'
. It matches one of more characters (.*
) followed by a dot (\\.
). The\\
is needed to escape the.
to treat it as that symbol instead of any character. This will match until the last.
character in the string. We replace that matched pattern with a''
as the replacement argument and thereby remove the substring.这篇关于防止R中的grep处理“。”。作为一封信的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!