如何在正则表达式中处理特殊字符,例如\ ^ $.?* | +()[{? [英] How do I deal with special characters like \^$.?*|+()[{ in my regex?

查看:925
本文介绍了如何在正则表达式中处理特殊字符,例如\ ^ $.?* | +()[{?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要匹配正则表达式特殊字符\^$.?*|+()[{.我试过了:

I want to match a regular expression special character, \^$.?*|+()[{. I tried:

x <- "a[b"
grepl("[", x)
## Error: invalid regular expression '[', reason 'Missing ']''

(相当于stringr::str_detect(x, "[")stringi::stri_detect_regex(x, "[").)

将值加倍以对其进行转义不起作用:

Doubling the value to escape it doesn't work:

grepl("[[", x)
## Error: invalid regular expression '[[', reason 'Missing ']''

也不使用反斜杠:

grepl("\[", x)
## Error: '\[' is an unrecognized escape in character string starting ""\["

如何匹配特殊字符?

此问题的一些特殊情况已经过时,且写得足够好,以至于可以厚脸皮地作为以下内容的副本来关闭:
R正则表达式中的转义期
如何在R中转义问号?
在正则表达式中转义管道("|")

Some special cases of this in questions that are old and well written enough for it to be cheeky to close as duplicates of this:
Escaped Periods In R Regular Expressions
How to escape a question mark in R?
escaping pipe ("|") in a regex

推荐答案

使用双反斜杠转义

R将反斜杠视为字符常量的转义值. (...正则表达式也是如此.因此,在为模式提供字符参数时需要两个反斜杠.第一个实际上不是字符,而是使第二个变成字符.)您可以看到如何使用 cat 处理.

Escape with a double backslash

R treats backslashes as escape values for character constants. (... and so do regular expressions. Hence the need for two backslashes when supplying a character argument for a pattern. The first one isn't actually a character, but rather it makes the second one into a character.) You can see how they are processed using cat.

y <- "double quote: \", tab: \t, newline: \n, unicode point: \u20AC"
print(y)
## [1] "double quote: \", tab: \t, newline: \n, unicode point: €"
cat(y)
## double quote: ", tab:    , newline: 
## , unicode point: €

进一步阅读:要在正则表达式中使用特殊字符,最简单的方法通常是使用反斜杠对它们进行转义,但是如上所述,反斜杠本身需要转义.

To use special characters in a regular expression the simplest method is usually to escape them with a backslash, but as noted above, the backslash itself needs to be escaped.

grepl("\\[", "a[b")
## [1] TRUE

要匹配反斜杠,您需要加倍转义,以产生四个反斜杠.

To match backslashes, you need to double escape, resulting in four backslashes.

grepl("\\\\", c("a\\b", "a\nb"))
## [1]  TRUE FALSE

rebus程序包包含每个特殊字符的常量,以免您误输入斜杠.

The rebus package contains constants for each of the special characters to save you mistyping slashes.

library(rebus)
OPEN_BRACKET
## [1] "\\["
BACKSLASH
## [1] "\\\\"

有关更多示例,请参见:

For more examples see:

?SpecialCharacters

您的问题可以通过以下方式解决:

Your problem can be solved this way:

library(rebus)
grepl(OPEN_BRACKET, "a[b")

形成一个角色类

您也可以将特殊字符包装在方括号中以形成字符类.

grepl("[?]", "a?b")
## [1] TRUE

两个特殊字符在字符类中具有特殊含义:\^.

Two of the special characters have special meaning inside character classes: \ and ^.

即使在字符类中,反斜杠仍然需要转义.

Backslash still needs to be escaped even if it is inside a character class.

grepl("[\\\\]", c("a\\b", "a\nb"))
## [1]  TRUE FALSE

只有在开方括号后方,才需要跳开笛子.

Caret only needs to be escaped if it is directly after the opening square bracket.

grepl("[ ^]", "a^b")  # matches spaces as well.
## [1] TRUE
grepl("[\\^]", "a^b") 
## [1] TRUE

rebus还允许您形成字符类.

rebus also lets you form a character class.

char_class("?")
## <regex> [?]

使用预先存在的字符类

如果要匹配所有标点符号,则可以使用[:punct:]字符类.

grepl("[[:punct:]]", c("//", "[", "(", "{", "?", "^", "$"))
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE

stringi将此映射到Unicode通用类别以进行标点,因此其行为略有不同.

stringi maps this to the Unicode General Category for punctuation, so its behaviour is slightly different.

stri_detect_regex(c("//", "[", "(", "{", "?", "^", "$"), "[[:punct:]]")
## [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE

您还可以使用跨平台语法来访问UGC.

You can also use the cross-platform syntax for accessing a UGC.

stri_detect_regex(c("//", "[", "(", "{", "?", "^", "$"), "\\p{P}")
## [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE

使用\ Q \ E转义

\\Q\\E之间放置字符使正则表达式引擎按原义而不是正则表达式对待它们.

Use \Q \E escapes

Placing characters between \\Q and \\E makes the regular expression engine treat them literally rather than as regular expressions.

grepl("\\Q.\\E", "a.b")
## [1] TRUE

rebus允许您编写正则表达式的文字块.

rebus lets you write literal blocks of regular expressions.

literal(".")
## <regex> \Q.\E

不要使用正则表达式

正则表达式并不总是答案.如果要匹配固定的字符串,则可以这样做,例如:

Don't use regular expressions

Regular expressions are not always the answer. If you want to match a fixed string then you can do, for example:

grepl("[", "a[b", fixed = TRUE)
stringr::str_detect("a[b", fixed("["))
stringi::stri_detect_fixed("a[b", "[")

这篇关于如何在正则表达式中处理特殊字符,例如\ ^ $.?* | +()[{?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆