Postgresql regexp_matches 语法未按预期工作 [英] Postgresql regexp_matches syntax not working as expected
问题描述
我使用 Postgres regexp_matches 函数来提取数字.
我使用的正则表达式是
4([\s\-\/\.]*?0){3}([\s\-\/\.]*?[12]){1}([\s\-\/\.]*?\d){4}
如果我使用像
顺便说一句,-
在括号表达式的开始/结束时不需要转义,也不需要转义/
和<代码>. 那里.我还建议在任何支持限制量词的正则表达式中将 {1}
删除为 a
= a{1}
.
I use the Postgres regexp_matches function to extract numbers.
The regular expression I use is
4([\s\-\/\.]*?0){3}([\s\-\/\.]*?[12]){1}([\s\-\/\.]*?\d){4}
If I use a tool like https://regexr.com/ to verify if it's working and I apply the following test set
4-0001-1234
5-2342-2344
499999999
4-0001-1234 4.0001.12344 4-0-0-0-1-1234
I get the expected extraction result:
4-0001-1234
4-0001-1234
4.0001.1234
4-0-0-0-1-1234
However, if I use the same expression in Postgresql, it does work well:
SELECT unnest(regexp_matches('4-0001-1234', '4([\s\-\/\.]*?0){3}([\s\-\/\.]*?[12]){1}([\s\-\/\.]*?\d){4}', 'g'));
Result:
0
1
4
What I suspect is, that it has to do with the greediness and/or that the quantifiers like {3} are not applied in the right way. Or it uses the Posix standard for regular expressions, which always seem to differ a little bit from the Java Syntax.
Any suggestions why it is not working and how to fix it?
The regexp_matches(string text, pattern text [, flags text])
function returns the captured values:
Return all captured substrings resulting from matching a POSIX regular expression against the string.
You may fix the expression using non-capturing groups:
SELECT unnest(regexp_matches('4-0001-1234 4.0001.12344 4-0-0-0-1-1234', '4(?:[\s/.-]*0){3}(?:[\s/.-]*[12])(?:[\s/.-]*\d){4}', 'g'));
See the online demo.
BTW, you do not need to escape -
when it is at the start/end of the bracket expression, and there is no need to escape neither /
nor .
there. I also suggest removing {1}
as a
= a{1}
in any regex supporting limiting quantifiers.
这篇关于Postgresql regexp_matches 语法未按预期工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!