如何评估正则表达式OR运算符 [英] How regular expression OR operator is evaluated

查看:57
本文介绍了如何评估正则表达式OR运算符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

T-SQL 中,我生成了,答案很简单: regex引擎处理表达式和输入字符串,从 left 正确 .

以您的模式为例, ^.{8} |.{12} $ |.{4} 开始从左侧检查输入字符串,并检查^.{8} -前8个字符.找到他们,这是一场比赛.然后,继续并使用.{12} $ 查找最后12个字符,并且再次有一个匹配项.然后,将匹配任何4个字符的字符串.

In T-SQL I have generated UNIQUEIDENTIFIER using NEWID() function. For example:

723952A7-96C6-421F-961F-80E66A4F29D2

Then, all dashes (-) are removed and it looks like this:

723952A796C6421F961F80E66A4F29D2

Now, I need to turn the string above to a valid UNIQUEIDENTIFIER using the following format xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx and setting the dashes again.

To achieve this, I am using SQL CLR implementation of the C# RegexMatches function with this ^.{8}|.{12}$|.{4} regular expression which gives me this:

SELECT *
FROM [dbo].[RegexMatches] ('723952A796C6421F961F80E66A4F29D2', '^.{8}|.{12}$|.{4}')

Using the above, I can easily build again a correct UNIQUEIDENTIFIER but I am wondering how the OR operator is evaluated in the regular expression. For example, the following will not work:

SELECT *
FROM [dbo].[RegexMatches] ('723952A796C6421F961F80E66A4F29D2', '^.{8}|.{4}|.{12}$')

Is it sure that the first regular expression will first match the start and the end of the string, then the other values and is always returning the matches in this order (I will have issues if for example, 96C6 is matched after 421F).

解决方案

If you are interested in what happens when you use | alternation operator, the answer is easy: the regex engine processes the expression and the input string from left to right.

Taking the pattern you have as an example, ^.{8}|.{12}$|.{4} starts inspecting the input string from the left, and checks for ^.{8} - first 8 characters. Finds them and it is a match. Then, goes on and finds the last 12 characters with .{12}$, and again there is a match. Then, any 4-character strings are matched.

Debuggex Demo

Next, you have ^.{8}|.{4}|.{12}$. The expression is again parsed from left to right, first 8 characters are matched first, but next, only 4-character sequences will be matched, .{12} won't ever fire because there will be .{4} matches!

Debuggex Demo

这篇关于如何评估正则表达式OR运算符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆