在 perl 中使用正则表达式匹配最后一次出现 [英] Match from last occurrence using regex in perl
问题描述
我有这样一段文字:
hello world /* select a from table_b
*/ some other text with new line cha
racter and there are some blocks of
/* any string */ select this part on
ly
////RESULT rest string
文本是多行的,我需要从最后一次出现的*/"中提取,直到////RESULT".在这种情况下,结果应该是:
The text is multilined and I need to extract from last occurrence of "*/" until "////RESULT". In this case, the result should be:
select this part on
ly
如何在 perl 中实现这一点?
How to achieve this in perl?
我已经尝试过 \\\*/(.|\n)*////RESULT
但这将从第一个*/"开始
I have attempted \\\*/(.|\n)*////RESULT
but that will start from first "*/"
推荐答案
在这种情况下,一个有用的技巧是在正则表达式前面加上贪婪模式 .*
,它会尝试匹配尽可能多的在模式的其余部分匹配之前尽可能多地使用字符.所以:
A useful trick in cases like this is to prefix the regexp with the greedy pattern .*
, which will try to match as many characters as possible before the rest of the pattern matches. So:
my ($match) = ($string =~ m!^.*\*/(.*?)////RESULT!s);
让我们把这个模式分解成它的组成部分:
Let's break this pattern into its components:
^.*
从字符串的开头开始,匹配尽可能多的字符.(s
修饰符允许.
甚至匹配换行符.)字符串开头的锚点^
不是绝对必要的,但它确保如果匹配失败,正则表达式引擎不会浪费太多时间回溯.
^.*
starts at the beginning of the string and matches as many characters as it can. (Thes
modifier allows.
to match even newlines.) The beginning-of-string anchor^
is not strictly necessary, but it ensures that the regexp engine won't waste too much time backtracking if the match fails.
\*/
只匹配文字字符串 */
.
\*/
just matches the literal string */
.
(.*?)
匹配并捕获任意数量的字符;?
使它变得不贪婪,因此它更喜欢匹配尽可能少的字符,以防有多个位置可以匹配正则表达式的其余部分.
(.*?)
matches and captures any number of characters; the ?
makes it ungreedy, so it prefers to match as few characters as possible in case there's more than one position where the rest of the regexp can match.
最后,////RESULT
只匹配自身.
由于该模式包含大量斜线,而且我想避免倾斜牙签综合症,我决定使用替代的正则表达式分隔符.感叹号 (!
) 是一种流行的选择,因为它们不会与任何正常的正则表达式语法冲突.
Since the pattern contains a lot of slashes, and since I wanted to avoid leaning toothpick syndrome, I decided to use alternative regexp delimiters. Exclamation points (!
) are a popular choice, since they don't collide with any normal regexp syntax.
根据下面与池上的讨论,我想我应该注意的是,如果您想将此正则表达式用作更长正则表达式中的子模式,并且如果您想保证(.*?)
匹配的字符串将永远包含////RESULT
,那么你应该将正则表达式的这些部分包装在一个独立(?>)
子表达式,像这样:
Per discussion with ikegami below, I guess I should note that, if you want to use this regexp as a sub-pattern in a longer regexp, and if you want to guarantee that the string matched by (.*?)
will never contain ////RESULT
, then you should wrap those parts of the regexp in an independent (?>)
subexpression, like this:
my $regexp = qr!\*/(?>(.*?)////RESULT)!s;
...
my $match = ($string =~ /^.*$regexp$some_other_regexp/s);
(?>)
导致其中的模式失败,而不是接受次优匹配(即超出匹配 ////RESULT
) 即使这意味着正则表达式的其余部分将无法匹配.
The (?>)
causes the pattern inside it to fail rather than accepting a suboptimal match (i.e. one that extends beyond the first substring matching ////RESULT
) even if that means that the rest of the regexp will fail to match.
这篇关于在 perl 中使用正则表达式匹配最后一次出现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!