为什么不交替匹配更长的令牌? [英] Why won't a longer token in an alternation be matched?

查看:153
本文介绍了为什么不交替匹配更长的令牌?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用ruby 2.1,但可以在rubular网站上复制相同的内容.

I am using ruby 2.1, but the same thing can be replicated on rubular site.

如果这是我的字符串:

儘管中國婦幼衛生監測辦公室制定的

我使用此表达式进行正则表达式匹配:

And I do a regex match with this expression:

(中國婦幼衛生監測辦公室制定|管中)

我希望获得更长的令牌作为匹配项.

I am expecting to get the longer token as a match.

中國婦幼衛生監測辦公室制定

相反,我得到了第二轮比赛.

Instead I get the second alternation as a match.

据我所知,如果不使用中文字符,它确实可以正常工作.

As far as I know it does work like that when not in chinese characters.

如果这是我的字符串:

foobar

我使用此正则表达式:

(foobar|foo)

返回的匹配结果是foobar.如果顺序相反,则匹配字符串为foo.这对我来说很有意义.

Returned matching result is foobar. If the order is in the other way, than the matching string is foo. That makes sense to me.

推荐答案

您对正则表达式匹配更长的替换的假设是不正确的.

Your assumption that regex matches a longer alternation is incorrect.

快速刷新:正则表达式的工作方式:状态机始终从左到右读取,并在必要时回溯.

有两个指针,一个在模式上:

There are two pointers, one on the Pattern:

(cdefghijkl|bcd)

另一个在您的字符串上:

The other on your String:

abcdefghijklmnopqrstuvw

字符串上的指针从左侧移动.只要它可以返回,它就会:

The pointer on the String moves from the left. As soon as it can return, it will:


(来源: gyazo.com )


(source: gyazo.com)

我们将其转变为更具顺序性"的内容.理解顺序:

Let's turn that into a more "sequential" sequence for understanding:


(来源: gyazo.com )


(source: gyazo.com)

您的foobar示例是一个不同的主题.正如我在这篇文章中提到的 :

Your foobar example is a different topic. As I mentioned in this post:

正则表达式的工作方式:状态机始终从左到右读取. ,|,, == ,,因为它始终只会与第一个替代匹配.

How regex works: The state machine always reads from left to right. ,|,, == ,, as it always will only be matched to the first alternation.

.这很好,Unihedron,但是我如何将其强制为第一个变更?

看! *

^(?:.*?\Kcdefghijkl|.*?\Kbcd)

这里有一个 regex演示.

此正则表达式首先尝试将整个字符串与第一个替代匹配.只有当它完全失败失败时,它才会尝试匹配第二个交替. \K在此处用于保留与内容 构造\K的匹配.

This regex first attempts to match the entire string with the first alternation. Only if it fails completely will it then attempt to match the second alternation. \K is used here to keep the match with the contents behind the construct \K.

*:从2.0.0开始,Ruby支持\K.

*: \K was supported in Ruby since 2.0.0.

了解更多信息

  • The Stack Overflow Regex Reference
  • On greedy vs non-greedy





啊,我很无聊,所以我优化了正则表达式:





Ah, I was bored, so I optimized the regex:

^(?:(?:(?!cdefghijkl)c?[^c]*)++\Kcdefghijkl|(?:(?!bcd)b?[^b]*)++\Kbcd)

您可以在此处观看演示.

这篇关于为什么不交替匹配更长的令牌?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆