为什么非贪婪量词有时在 Oracle 正则表达式中不起作用? [英] Why doesn't a non-greedy quantifier sometimes work in Oracle regex?

查看:68
本文介绍了为什么非贪婪量词有时在 Oracle 正则表达式中不起作用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

IMO,此查询应返回 A=1,B=2,

IMO, this query should return A=1,B=2,

SELECT regexp_substr('A=1,B=2,C=3,', '.*B=.*?,') as A_and_B FROM dual

但它返回整个字符串,A=1,B=2,C=3,,而不是.为什么?

But it returns the whole string, A=1,B=2,C=3,, instead. Why?

更新 1:

在正则表达式中使用 Perl 样式的元字符需要 Oracle 10.2+.

Oracle 10.2+ is required to use Perl-style metacharacters in regular expressions.

更新 2:

我的问题的更清晰形式(以避免有关 Oracle 版本和 Perl 样式正则表达式扩展的可用性的问题):

A more clear form of my question (to avoid questions about Oracle version and availability of Perl-style regex extension):

在同一个系统上,为什么非贪婪量词有时能按预期工作,有时却不能?

On the same system, why does a non-greedy quantifier sometimes work as expected and sometimes not?

这可以正常工作:

regexp_substr('A=1,B=2,C=3,', 'B=.*?,')

这不起作用:

regexp_substr('A=1,B=2,C=3,', '.*B=.*?,')

小提琴

更新 3:

是的,这似乎是一个错误.

Yes, it seems to be a bug.

Oracle 支持人员对此问题有何反应?

该错误是否已知?有身份证吗?

Is the bug already known? Does it have an ID?

推荐答案

这是一个 BUG!

你是对的,在 Perl 中,'A=1,B=2,C=3,' =~/.*B=.*?,/;打印 $& 打印 A=1,B=2,

您偶然发现了一个仍然存在于 Oracle Database 11g R2 中的错误.如果完全相同的正则表达式原子(包括量词但不包括贪婪修饰符)在正则表达式中出现两次,则无论第二次出现的贪婪程度如何,两次出现都将具有第一次出现所指示的贪婪程度.这些结果清楚地表明这是一个错误(这里,完全相同的正则表达式原子"是 [^B]*):

What you have stumbled upon is a bug that still exists in Oracle Database 11g R2. If the exact same regular expression atom (including the quantifier but excluding the greediness modifier) appears twice in a regular expression, both occurrences will have the greediness indicated by the first appearance regardless of the greediness specified by the second one. That this is a bug is clearly demonstrated by these results (here, "the exact same regular expression atom" is [^B]*):

SQL> SELECT regexp_substr('A=1,B=2,C=3,', '[^B]*B=[^Bx]*?,') as good FROM dual;

GOOD
--------
A=1,B=2,

SQL> SELECT regexp_substr('A=1,B=2,C=3,', '[^B]*B=[^B]*?,') as bad FROM dual;

BAD
-----------
A=1,B=2,C=3,

这两个正则表达式之间的唯一区别是good"一个排除 'x' 作为第二个匹配列表中的可能匹配项.由于 'x' 没有出现在目标字符串中,排除它应该没有区别,但正如您所看到的,删除 'x' 会产生很大的不同.那一定是个错误.

The only difference between the two regular expressions is that the "good" one excludes 'x' as a possible match in the second matching list. Since 'x' does not appear in the target string, excluding it should make no difference, but as you can see, removing the 'x' makes a big difference. That has to be a bug.

以下是来自 Oracle 11.2 的更多示例:(SQL Fiddle 包含更多示例)

Here are some more examples from Oracle 11.2: (SQL Fiddle with even more examples)

SELECT regexp_substr('A=1,B=2,C=3,', '.*B=.*?,')  FROM dual; =>  A=1,B=2,C=3,
SELECT regexp_substr('A=1,B=2,C=3,', '.*B=.*,')   FROM dual; =>  A=1,B=2,C=3,
SELECT regexp_substr('A=1,B=2,C=3,', '.*?B=.*?,') FROM dual; =>  A=1,B=2,
SELECT regexp_substr('A=1,B=2,C=3,', '.*?B=.*,')  FROM dual; =>  A=1,B=2,
-- Changing second operator from * to +
SELECT regexp_substr('A=1,B=2,C=3,', '.*B=.+?,')  FROM dual; =>  A=1,B=2,
SELECT regexp_substr('A=1,B=2,C=3,', '.*B=.+,')   FROM dual; =>  A=1,B=2,C=3,
SELECT regexp_substr('A=1,B=2,C=3,', '.+B=.+,')   FROM dual; =>  A=1,B=2,C=3,
SELECT regexp_substr('A=1,B=2,C=3,', '.+?B=.+,')  FROM dual; =>  A=1,B=2,

模式是一致的:第一次出现的贪婪被用于第二次出现,无论是否应该出现.

The pattern is consistent: the greediness of the first occurrence is used for the second occurrence whether it should be or not.

这篇关于为什么非贪婪量词有时在 Oracle 正则表达式中不起作用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆