如何计算正则表达式 OR 运算符 [英] How regular expression OR operator is evaluated
问题描述
在 T-SQL 中,我生成了 ,答案很简单:正则表达式引擎处理表达式和输入字符串从left到对.
以您拥有的模式为例,^.{8}|.{12}$|.{4}
从左侧开始检查输入字符串,并检查 ^.{8}
- 前 8 个字符.找到它们,它是匹配的.然后,继续查找带有 .{12}$
的最后 12 个字符,并且再次匹配.然后,匹配任意 4 个字符的字符串.
In T-SQL I have generated UNIQUEIDENTIFIER using NEWID() function. For example:
723952A7-96C6-421F-961F-80E66A4F29D2
Then, all dashes (-
) are removed and it looks like this:
723952A796C6421F961F80E66A4F29D2
Now, I need to turn the string above to a valid UNIQUEIDENTIFIER
using the following format xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
and setting the dashes again.
To achieve this, I am using SQL CLR
implementation of the C#
RegexMatches
function with this ^.{8}|.{12}$|.{4}
regular expression which gives me this:
SELECT *
FROM [dbo].[RegexMatches] ('723952A796C6421F961F80E66A4F29D2', '^.{8}|.{12}$|.{4}')
Using the above, I can easily build again a correct UNIQUEIDENTIFIER
but I am wondering how the OR
operator is evaluated in the regular expression. For example, the following will not work:
SELECT *
FROM [dbo].[RegexMatches] ('723952A796C6421F961F80E66A4F29D2', '^.{8}|.{4}|.{12}$')
Is it sure that the first regular expression will first match the start and the end of the string, then the other values and is always returning the matches in this order (I will have issues if for example, 96C6
is matched after 421F
).
If you are interested in what happens when you use |
alternation operator, the answer is easy: the regex engine processes the expression and the input string from left to right.
Taking the pattern you have as an example, ^.{8}|.{12}$|.{4}
starts inspecting the input string from the left, and checks for ^.{8}
- first 8 characters. Finds them and it is a match. Then, goes on and finds the last 12 characters with .{12}$
, and again there is a match. Then, any 4-character strings are matched.
Next, you have ^.{8}|.{4}|.{12}$
. The expression is again parsed from left to right, first 8 characters are matched first, but next, only 4-character sequences will be matched, .{12}
won't ever fire because there will be .{4}
matches!
这篇关于如何计算正则表达式 OR 运算符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!