如何使用 Perl 在 Regex 中使用分词、星号、分词? [英] How to use word break, asterisk, word break in Regex with Perl?
问题描述
我在 Perl 中有一个复杂的预编译正则表达式.在大多数情况下,正则表达式很好,可以匹配它应该匹配的所有内容,而没有不应该匹配的内容.除了一分.
基本上我的正则表达式如下:
my $regexp = qr/\b(FOO|BAR|\*)\b/;
不幸的是 m/\b\*\b/
与 example, *
不匹配.只有 m/\*/
会做,因为误报我不能使用.有什么解决办法吗?
来自评论 - 误报是:**
、example*
、exam*ple
>
正则表达式的用途是什么?- 它应该提取同事输入到产品数据中的关键字(一个是单个星号).目标是将这些信息从自由文本字段中移到原子字段中.
听起来您想将 *
视为单词字符.
<代码>\b
相当于
(?x: (?
你想要
(?x: (?
申请后,您将获得以下内容:
qr/(?: (?<![\w*])(?=[\w*]) | (?<=[\w*])(?![\w*]) )(FOO|酒吧|\*)(?: (?<![\w*])(?=[\w*]) | (?<=[\w*])(?![\w*]) )/X
但鉴于我们对中间表达式的了解,可以将其简化为以下内容:
qr/(?
I have a complexe precompiled regular expression in Perl. For most cases the regex is fine and matches everything it should and nothing it shouldn't. Except one point.
Basically my regex looks like:
my $regexp = qr/\b(FOO|BAR|\*)\b/;
Unfortunately m/\b\*\b/
won't match example, *
. Only m/\*/
will do which I can't use because of false positives. Is there any workaround?
from the comments - false positives are: **
, example*
, exam*ple
what the regex is intended for? - It should extract keywords (one is a single asterisk) coworkers have entered into product data. the goal is to move this information out of a freetext field into an atomic one.
It sounds like you want to treat *
as a word character.
\b
is equivalent to
(?x: (?<!\w)(?=\w) | (?<=\w)(?!\w) )
so you want
(?x: (?<![\w*])(?=[\w*]) | (?<=[\w*])(?![\w*]) )
Applied, you get the following:
qr/
(?: (?<![\w*])(?=[\w*]) | (?<=[\w*])(?![\w*]) )
(FOO|BAR|\*)
(?: (?<![\w*])(?=[\w*]) | (?<=[\w*])(?![\w*]) )
/x
But given our knowledge of the middle expression, that can be simplified to the following:
qr/(?<![\w*])(FOO|BAR|\*)(?![\w*])/
这篇关于如何使用 Perl 在 Regex 中使用分词、星号、分词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!