为什么非贪婪量词有时在Oracle正则表达式中不起作用? [英] Why non-greedy quantifier sometimes doesn't work in Oracle regex?

查看:267
本文介绍了为什么非贪婪量词有时在Oracle正则表达式中不起作用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

IMO,此查询应返回A=1,B=2,

IMO, this query should return A=1,B=2,

SELECT regexp_substr('A=1,B=2,C=3,', '.*B=.*?,') as A_and_B FROM dual

但是它返回整个字符串A=1,B=2,C=3,.为什么?

But it returns whole string A=1,B=2,C=3, instead. Why?

UPD :需要Oracle 10.2+在正则表达式中使用Perl样式的元字符.

UPD: Oracle 10.2+ required to use Perl-style metacharacters in regular expressions.

UPD2:
我的问题的形式更清晰(避免有关Oracle版本和Perl样式正则表达式扩展的可用性的问题):
为什么在同一个系统上,非贪婪量词有时会按预期工作,有时却无法正常工作?

UPD2:
More clear form of my question (to avoid questions about Oracle version and availability of Perl-style regex extension):
Why ON THE SAME SYSTEM non-greedy quantifier sometimes works as expected and sometimes doesn't?

这可以正常工作:

regexp_substr('A=1,B=2,C=3,', 'B=.*?,')

这不起作用:

regexp_substr('A=1,B=2,C=3,', '.*B=.*?,')

小提琴

UPD3:
是的,这似乎是一个错误.
任何人都可以就此问题提供Oracle支持反应吗?
该错误已经知道吗?
它有ID吗?

UPD3:
Yes, it seems to be a bug.
Can anyone provide Oracle Support reaction on this issue?
Is the bug already known?
Does it have an ID?

推荐答案

这是一个错误!

您是正确的,在Perl中,'A=1,B=2,C=3,' =~ /.*B=.*?,/; print $&打印A=1,B=2,

您偶然发现的是Oracle Database 11g R2中仍然存在的错误.如果正则表达式完全相同的正则表达式运算符(不包括贪婪修饰符)在正则表达式中出现两次,则无论第二次出现的贪婪程度如何,两次出现都具有第一次出现所指示的贪婪性.这些结果清楚地证明了这是一个错误:

What you have stumbled upon is a bug that still exists in Oracle Database 11g R2. If the exact same regular expression operator (excluding the greediness modifier) appears twice in a regular expression, both occurrences will have the greediness indicated by the first appearance regardless of the greediness specified by the second one. That this is a bug is clearly demonstrated by these results:

SQL> SELECT regexp_substr('A=1,B=2,C=3,', '[^B]*B=[^Bx]*?,') as good FROM dual;

GOOD
--------
A=1,B=2,

SQL> SELECT regexp_substr('A=1,B=2,C=3,', '[^B]*B=[^B]*?,') as bad FROM dual;

BAD
-----------
A=1,B=2,C=3,

两个正则表达式之间的唯一区别是,好"一个排除"x"作为第二个匹配列表中的可能匹配项.由于'x'不会出现在目标字符串中,因此排除它不会有什么区别,但是如您所见,删除'x'会有很大的不同.那一定是个错误.

The only difference between the two regular expressions is that the "good" one excludes 'x' as a possible match in the second matching list. Since 'x' does not appear in the target string, excluding it should make no difference, but as you can see, removing the 'x' makes a big difference. That has to be a bug.

以下是来自Oracle 11.2的更多示例:( SQL提琴还有更多示例)

Here are some more examples from Oracle 11.2: (SQL Fiddle with even more examples)

SELECT regexp_substr('A=1,B=2,C=3,', '.*B=.*?,')  FROM dual; =>  A=1,B=2,C=3,
SELECT regexp_substr('A=1,B=2,C=3,', '.*B=.*,')   FROM dual; =>  A=1,B=2,C=3,
SELECT regexp_substr('A=1,B=2,C=3,', '.*?B=.*?,') FROM dual; =>  A=1,B=2,
SELECT regexp_substr('A=1,B=2,C=3,', '.*?B=.*,')  FROM dual; =>  A=1,B=2,
-- Changing second operator from * to +
SELECT regexp_substr('A=1,B=2,C=3,', '.*B=.+?,')  FROM dual; =>  A=1,B=2,
SELECT regexp_substr('A=1,B=2,C=3,', '.*B=.+,')   FROM dual; =>  A=1,B=2,C=3,
SELECT regexp_substr('A=1,B=2,C=3,', '.+B=.+,')   FROM dual; =>  A=1,B=2,C=3,
SELECT regexp_substr('A=1,B=2,C=3,', '.+?B=.+,')  FROM dual; =>  A=1,B=2,

模式是一致的:无论是否应该将第一次出现的贪婪度用于第二次出现.

The pattern is consistent: the greediness of the first occurrence is used for the second occurrence whether it should be or not.

这篇关于为什么非贪婪量词有时在Oracle正则表达式中不起作用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆