gsub中的正则表达式问题 [英] Regex issue in gsub

查看：259 发布时间：2020/11/21 18:43:57 r regex gsub

本文介绍了gsub中的正则表达式问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我已经定义

vec <- "5f 110y, Fast"

和

gsub("[\\s0-9a-z]+,", "", vec)

给出"5f Fast"

我希望它会给出"Fast"，因为逗号前的所有内容都应由正则表达式进行匹配.

I would have expected it to give "Fast" since everything before the comma should get matched by the regex.

有人可以向我解释为什么不是这种情况吗?

Can anyone explain to me why this is not the case?

您应该记住，在TRE正则表达式模式中，您不能使用\s，\d，\w之类的正则表达式转义符.

You should keep in mind that, in TRE regex patterns, you cannot use regex escapes like \s, \d, \w.

因此，您所用的正则表达式"[\\s0-9a-z]+,"匹配1个或多个\，s，数字和小写ASCII字母，然后匹配一个,.

So, the regex in your case, "[\\s0-9a-z]+,", matches 1 or more \, s, digits and lowercase ASCII letters, and then a single ,.

您可以改用POSIX字符类，例如[:space:](任何空白)或[:blank:](水平空白):

You may use POSIX character classes instead, like [:space:] (any whitespaces) or [:blank:] (horizontal whitespaces):

> gsub("[[:space:]0-9a-z]+,", "", vec)
[1] " Fast"

或者，将PCRE正则表达式与\s和perl=TRUE参数一起使用:

Or, use a PCRE regex with \s and perl=TRUE argument:

> gsub("[\\s0-9a-z]+,", "", vec, perl=TRUE)
[1] " Fast"

要使\s与所有Unicode空格匹配，请在模式开头gsub("(*UCP)[\\s0-9a-z]+,", "", vec, perl=TRUE)处添加(*UCP) PCRE动词.

To make \s match all Unicode whitespaces, add (*UCP) PCRE verb at the pattern start: gsub("(*UCP)[\\s0-9a-z]+,", "", vec, perl=TRUE).

这篇关于gsub中的正则表达式问题的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文