在正则表达式中转义管道(“|") [英] escaping pipe ("|") in a regex

查看:61
本文介绍了在正则表达式中转义管道(“|")的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要拆分单词和结束标记(某些类型的标点符号).奇怪的管道(|")可以算作结束标记.在我尝试添加管道之前,我在结束标记上有这些词的代码.添加管道使 strsplit 每个字符.逃避它的原因和错误.如何在正则表达式中包含管道?

I have a need to split on words and end marks (punctuation of certain types). Oddly pipe ("|") can count as an end mark. I have code that words on end marks until I try to add the pipe. Adding the pipe makes the strsplit every character. Escaping it causes and error. How can I include the pipe int he regular expression?

x <- "I like the dog|."

strsplit(x, "[[:space:]]|(?=[.!?*-])", perl=TRUE)
#[[1]]
#[1] "I"    "like" "the"  "dog|" "."   

strsplit(x, "[[:space:]]|(?=[.!?*-\|])", perl=TRUE)
#Error: '\|' is an unrecognized escape in character string starting "[[:space:]]|(?=[.!?*-\|"

我想要的结果:

#[[1]]
#[1] "I"    "like" "the"  "dog"  "|"  "."  #pipe is an element

推荐答案

解决此问题的一种方法是使用 \Q...\E 表示法删除任何... 中的字符.正如它在 ?regex 中所说:

One way to solve this is to use the \Q...\E notation to remove the special meaning of any of the characters in .... As it says in ?regex:

如果要从序列中删除特殊含义字符,你可以把它们放在‘\Q’和‘\E’之间.这与 Perl 的不同之处在于‘$’和‘@’在PCRE 中的‘\Q...\E’序列,而在 Perl 中,‘$’和‘@’导致变量插值.

If you want to remove the special meaning from a sequence of characters, you can do so by putting them between ‘\Q’ and ‘\E’. This is different from Perl in that ‘$’ and ‘@’ are handled as literals in ‘\Q...\E’ sequences in PCRE, whereas in Perl, ‘$’ and ‘@’ cause variable interpolation.

例如:

> strsplit(x, "[[:space:]]|(?=[\\Q.!?*-|\\E])", perl=TRUE)
[[1]]
[1] "I"    "like" "the"  "dog"  "|"    "."

这篇关于在正则表达式中转义管道(“|")的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆