正则表达式，结合使用or运算符来了解背后的情况 [英] Regular Expressions, understanding lookbehind in combination with the or operator

查看：125 发布时间：2020/7/1 4:50:56 regex sublimetext3 regex-lookarounds

本文介绍了正则表达式，结合使用or运算符来了解背后的情况的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

与实际问题相比，这更多是理解上的问题.情况说明如下.我在两个引号"之间有一些浮点数字(例如，金额).

This is more a question of understanding than an actual problem. The situation explains as follows. I got some float numbers (e.g. an amount of money) between two quotation marks "".

示例:

"1,23"
"12,23"
"123,23"

现在我想在这些表达式中匹配逗号.我构建了以下对我有用的正则表达式:

Now I wanted to match the comma in those expressions. I built the following regex which works for me:

(?<=\"[0-9]|[0-9]{2})(,)(?=[0-9]{2}\")

我不完全理解的部分是与或"|"组合使用的后向功能.但让我们分解一下:

The part which I don't completly understand is the lookbehind in combination with the or "|". But let's break it up:

(
?<=             //Start of the lookbehind
\"              //Starting with an escaped quotation mark "
[0-9]           //Followed by a digit between 0 and 9

现在，我遇到了一个问题，即引号并不总是如示例2和3所示总是仅一位数字. {1,3}在后面的范围内不起作用.正如我在另一个 stackoverflow问题中所发现的.

Now I had the problem, that after the quotation mark wasn't always just one digit as you can see in the examples 2 and 3. The range operator e.g. {1,3} did not work within the lookbehind. As I found out in another stackoverflow question.

因此，我决定使用或"|" 此处:

So I decided to use the or "|" operator as sugested here:

|[0-9]{2}       //Or followed by two digits between 0 and 9
)

有趣的是，它也与第三个示例"123,23"中的逗号匹配.我真的不明白为什么. 另外，我也不知道为什么不必在或"后添加引号.再次使用运算符，因为我认为必须修改或重复直到或运算符之前的完整回溯，例如:

The interesting part is that it also matches the comma in the third example "123,23". I don't really understand why. Also I don't know why I don't have to add the starting quotation mark after the or "|" operator again, because I thought that the complete lookbehind until the or operator would be necessary to be modified or repeated e.g.:

(?<=\"[0-9]|\"[0-9]{2})(,)(?=[0-9]{2}\")            //This however does not work at all

因此，据我了解，匹配所有三个示例的相应正则表达式应如下所示:

So in my understanding the corresponding regular expression to match all three examples should look like the following:

(?<=\"[0-9]|\"[0-9]{2}|\"[0-9]{3})(,)(?=[0-9]{2}\")

或至少(如果有人可以解释缺少的\):

or at least (if someone can explain the missing \"):

(?<=\"[0-9]|[0-9]{2}|[0-9]{3})(,)(?=[0-9]{2}\")

我希望有人能够帮助我了解情况.

I hope someone is able to help me understand the situation.

// 如果特别感兴趣，我会在sublime text 3编辑器的常规文本文件中使用此正则表达式来搜索逗号并将其替换.

// If it is of special interest, I used this regex in a regular textfile in the sublime text 3 editor, to search for the comma and replace it.

推荐答案

您是对的

(?<=\"[0-9]|\"[0-9]{2}|\"[0-9]{3})(,)(?=[0-9]{2}\")

在这种情况下，

应该是正确的正则表达式.

关于为什么您不需要两位和三位数的\"" -您实际上需要它.

should be the right regex in this case.

About why you "don't need the \" for two and three digits" - you actually need it.

(?<=\"[0-9]|[0-9]{2}|[0-9]{3})(,)(?=[0-9]{2}\")

也将匹配12,23"和123,23".

编辑: 看起来问题在于，即使Sublime用|列出，也不允许可变长度的向后寻找.含义(?<=\"[0-9]|\"[0-9]{2}|\"[0-9]{3})将失败，因为替代项的大小不同- 2、3、4 .

Will match 12,23" and 123,23" as well.

EDIT: Looks like the problem is that Sublime doesn't allow for variable length of lookbehind even if they are listed with |. Meaning (?<=\"[0-9]|\"[0-9]{2}|\"[0-9]{3}) will fail, because the alternatives are not of the same size - 2, 3, 4.

这是因为Sublime似乎正在使用有说明:

This is because Sublime seems to be using the Boost library regexes. There it is stated:

躲在后面

(?<=pattern)仅使用模式可以与当前位置之前的字符匹配(模式必须具有固定长度)时，才会消耗零个字符.

(?<=pattern) consumes zero characters, only if pattern could be matched against the characters preceding the current position (pattern must be of fixed length).

(?<!pattern)消耗零个字符，只有当模式不能与当前位置之前的字符匹配时(模式必须是固定长度).

(?<!pattern) consumes zero characters, only if pattern could not be matched against the characters preceding the current position (pattern must be of fixed length).

另一种方法是将lookbehinds分开:

An alternative is to separate the lookbehinds:

(?:(?<=\"[0-9])|(?<=\"[0-9]{2})|(?<=\"[0-9]{3}))(,)(?=[0-9]{2}\")

如果不想列出所有可能的长度怎么办?

What can you do if you don't want to list all possible lengths?

在某些正则表达式引擎(包括Perl，Ruby和Sublime的引擎)中存在一个很酷的技巧-\K. \K大致翻译为删除到目前为止已匹配的所有内容" .因此，您可以将用引号括起来的浮点数内的任何,匹配:

There is a cool trick which is present in some regex engines (including Perl's, Ruby's and Sublime's) - \K. What \K roughly translates to is "drop all that was matched so far". Therefore, you can match any , within a float number surrounded by quotation marks with:

"\d+\K,(?=\d+")

查看实际效果

这篇关于正则表达式，结合使用or运算符来了解背后的情况的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

正则表达式，结合使用or运算符来了解背后的情况 [英] Regular Expressions, understanding lookbehind in combination with the or operator

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

正则表达式，结合使用or运算符来了解背后的情况 [英] Regular Expressions, understanding lookbehind in combination with the or operator

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭