非贪婪定期EX pression比赛在AWK分隔符多字符 [英] Non-greedy regular expression match for multicharacter delimiters in awk
问题描述
考虑字符串AB 1 BA 2 AB 3 BA
。我怎么能AB
和在非贪婪时尚BA
(在AWK)之间的内容相匹配?
Consider the string "AB 1 BA 2 AB 3 BA"
. How can I match the content between "AB"
and "BA"
in a non-greedy fashion (in awk)?
我曾尝试以下内容:
awk '
BEGIN {
str="AB 1 BA 2 AB 3 BA"
regex="AB([^B][^A]|B[^A]|[^B]A)*BA"
if (match(str,regex))
print substr(str,RSTART,RLENGTH)
}'
无输出。我相信不匹配的原因是,有AB
和BA
之间的奇数个字符。如果我更换 STR
与AB 11 BA 22 AB 33 BA
正则表达式似乎工作..
with no output. I believe the reason for no match is that there is an odd number of characters between "AB"
and "BA"
. If I replace str
with "AB 11 BA 22 AB 33 BA"
the regex seems to work..
推荐答案
合并你的两个否定的字符类和删除 [^ A]
从第二交替:
Merge your two negated character classes and remove the [^A]
from the second alternation:
regex = "AB([^AB]|B|[^B]A)*BA"
这正则表达式的字符串失败亚的斯亚贝巴
,虽然 - 不知道这是一个问题。
This regex fails on the string ABABA
, though - not sure if that is a problem.
说明:
AB # Match AB
( # Group 1 (could also be non-capturing)
[^AB] # Match any character except A or B
| # or
B # Match B
| # or
[^B]A # Match any character except B, then A
)* # Repeat as needed
BA # Match BA
因为只有这样,才能在交替匹配 A
是之前除了 B
匹配的字符,我们可以安全地使用简单的 B
作为备选方案之一。
Since the only way to match an A
in the alternation is by matching a character except B
before it, we can safely use the simple B
as one of the alternatives.
这篇关于非贪婪定期EX pression比赛在AWK分隔符多字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!