非贪婪定期EX pression比赛在AWK分隔符多字符 [英] Non-greedy regular expression match for multicharacter delimiters in awk

查看:117
本文介绍了非贪婪定期EX pression比赛在AWK分隔符多字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑字符串AB 1 BA 2 AB 3 BA。我怎么能AB在非贪婪时尚BA(在AWK)之间的内容相匹配?

Consider the string "AB 1 BA 2 AB 3 BA". How can I match the content between "AB" and "BA" in a non-greedy fashion (in awk)?

我曾尝试以下内容:

awk '
BEGIN {
    str="AB 1 BA 2 AB 3 BA"
    regex="AB([^B][^A]|B[^A]|[^B]A)*BA"
    if (match(str,regex))
        print substr(str,RSTART,RLENGTH)
}'

无输出。我相信不匹配的原因是,有ABBA之间的奇数个字符。如果我更换 STR AB 11 BA 22 AB 33 BA正则表达式似乎工作..

with no output. I believe the reason for no match is that there is an odd number of characters between "AB" and "BA". If I replace str with "AB 11 BA 22 AB 33 BA" the regex seems to work..

推荐答案

合并你的两个否定的字符类和删除 [^ A] 从第二交替:

Merge your two negated character classes and remove the [^A] from the second alternation:

regex = "AB([^AB]|B|[^B]A)*BA"

这正则表达式的字符串失败亚的斯亚贝巴,虽然 - 不知道这是一个问题。

This regex fails on the string ABABA, though - not sure if that is a problem.

说明:

AB       # Match AB
(        # Group 1 (could also be non-capturing)
 [^AB]   # Match any character except A or B
|        # or
 B       # Match B
|        # or
 [^B]A   # Match any character except B, then A
)*       # Repeat as needed
BA       # Match BA

因为只有这样,才能在交替匹配 A 是之前除了 B 匹配的字符,我们可以安全地使用简单的 B 作为备选方案之一。

Since the only way to match an A in the alternation is by matching a character except B before it, we can safely use the simple B as one of the alternatives.

这篇关于非贪婪定期EX pression比赛在AWK分隔符多字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆