与grep非贪婪匹配 [英] Non-greedy matching with grep

查看:793
本文介绍了与grep非贪婪匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我所知,非贪婪匹配不是基本正则表达式(BRE)和扩展正则表达式(ERE)的一部分。但是,不同版本的 grep (BSD和GNU)的行为似乎暗示了其他方面。



例如,让我们看看下面的例子。我有一个字符串说:

  string =hello_my_dear_polo



使用GNU grep



以下是几次尝试从字符串中提取 hello



BRE尝试次数:

  $ grep -ohel。* \?o<<<< $ string
hello_my_dear_polo

输出产生整个字符串,表明非贪婪量词不适用于BRE。请注意,因为 * 不会丢失它的含义,所以我只能逃脱,不需要转义。



ERE尝试次数:

  $ grep -oEhel。*?o<<<< $ string
hello_my_dear_polo

启用 -E 选项也会产生相同的输出,表明非贪婪匹配不是ERE的一部分。因为我们正在使用ERE,因此无需转义。



PCRE尝试次数:

  $ grep -oPhel。*?o<<<< $ string
hello

启用 -P < PCRE的code>选项表明非贪婪量词是它的一部分,因此我们得到了 hello 的期望输出。因为我们正在使用PCRE,所以在这里不需要转义。



使用BSD grep



以下是从字符串中提取 hello 的少数尝试。



BRE尝试次数:

  $ grep -ohel。* \?o<<<< $ string

使用BRE我没有从BSD输出 grep



ERE尝试次数:

  $ grep -oEhel。*?o<<<< $ string
hello

启用 -E 选项,我很惊讶,因为我能够提取所需的输出。我的问题是我从这次尝试中得到的结果。



PCRE尝试次数:

  $ grep -oPhel。*?o<<<< $ string
usage:grep [-abcDEFGHhIiJLlmnOoPqRSsUVvwxZ] [-A num] [-B num] [-C [num]]
[-e pattern] [-f file] [--binary
[--context [= num]] [--directories = action] [--label] [--line-buffered]
[ - ] -null] [pattern] [file ...]

使用 -P 选项给了我预期的使用错误,因为 grep 的BSD选项不支持PCRE。所以我的问题是为什么在BSD grep 上使用ERE会使用非贪婪量词产生正确的输出,但不会GNU grep

这是一个错误,BSD egrep 的未记录功能还是我对输出的误解?

解决方案

双量词只是一个语法错误,可能会导致错误消息或未定义的行为。如果您收到错误消息,这可能会更好。



Perl扩展到正则表达式post-date POSIX大幅度增加;在写这些工具的时候,有人不太可能会试图用这种古怪的语法来做任何事情。贪婪的匹配仅在20世纪90年代中期在Perl 5中引入。


Non greedy matching as far as I know is not part of Basic Regular Expression (BRE) and Extended Regular Expression (ERE). However, the behaviour on different versions of grep (BSD and GNU) seems to suggest other wise.

For example, let's take the following example. I have a string say:

string="hello_my_dear_polo"

Using GNU grep:

Following are few attempts to extract hello from the string.

BRE Attempt:

$ grep -o "hel.*\?o" <<< "$string"
hello_my_dear_polo

Output yields entire string which suggest the non-greedy quantifier does not work on BRE. Note that I have only escaped ? since * does not lose it's meaning and need not be escaped.

ERE Attempt:

$ grep -oE "hel.*?o" <<< "$string"
hello_my_dear_polo

Enabling the -E option also yields the same output suggesting that non-greedy matching is not part of ERE. Escaping was not needed here since we are using ERE.

PCRE Attempt:

$ grep -oP "hel.*?o" <<< "$string"
hello

Enabling the -P option for PCRE suggests that non-greedy quantifier is a part of it and hence we get the desired output of hello. Escaping was not needed here since we are using PCRE.

Using BSD grep:

Here are few attempts to extract hello from the string.

BRE Attempt:

$ grep -o "hel.*\?o" <<< "$string"

Using BRE I get no output from BSD grep.

ERE Attempt:

$ grep -oE "hel.*?o" <<< "$string"
hello

After enabling the -E option, I am surprised that I was able to extract my desired output. My question is on the output I am getting from this attempt.

PCRE Attempt:

$ grep -oP "hel.*?o" <<< "$string"
usage: grep [-abcDEFGHhIiJLlmnOoPqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
    [-e pattern] [-f file] [--binary-files=value] [--color=when]
    [--context[=num]] [--directories=action] [--label] [--line-buffered]
    [--null] [pattern] [file ...]

Using -P option gave me usage error which was expected since BSD option of grep does not support PCRE.

So my question is why would using ERE on BSD grep yield correct output with using non-greedy quantifier but not with GNU grep.

Is this a bug, an un-documented feature of BSD egrep or my mis-understanding of the output?

解决方案

The double quantifier is simply a syntax error and could result in either an error message or undefined behavior. It would arguably be better if you got an error message.

Perl extensions to regex post-date POSIX by a large margin; at the time these tools were written, it was extremely unlikely that someone would try to use this wacky syntax for anything. Greedy matching was only introduced in Perl 5, in the mid-1990s.

这篇关于与grep非贪婪匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆