是否经常EX pression引擎跳过,比模式更短的字符串? [英] Does the regular expression engine skip over strings that are shorter than the pattern?

查看:130
本文介绍了是否经常EX pression引擎跳过,比模式更短的字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要通过一组字符串循环。在每个字符串我想遍历一组正EX pression,以确定哪些EX pressions对我的字符串匹配。但是,如果该字符串长度比图案串的最大可能长度短,我想要的正则表达式引擎跳过它。

I want to loop through a set of strings. On each string i want to loop through a set of regular expression to determine which expressions match on the string I'm on. However, If the string length is shorter than the maximum possible length of the pattern string, I want the regex engine to skip over it.

例如说我停在一个字符串abc 与此正则表达式进行测试。

For example say I stop on a string "abc" and test it with this regex.

(?i)[A-Z]{3}

和它匹配。然后,我的下一个前pression测试就像

and it matches. Then my next expression to test is like

(?i)[A-Z]+(?=123)

将发动机仍然开始研究从字符串的开始,即使在第二种情况永远不会成为一个匹配?

Will the engine still start examining the string from the beginning even though the second case will never be a match?

如果是这样的话,有没有办法得到它跳过不符合最小长度要求的字符串?

If this is the case, is there a way to get it to skip over strings that don't meet minimum length requirement?

推荐答案

当你后的实施细则,并在源$ C ​​$ C可,说最好的办法是简单地看它。 :)

When you're after implementation details, and when the source code is available, the best way to tell is to simply look at it. :)

简短的回答是:不完全

在.NET正则表达式执行实施的优化是一个博耶 - 穆尔字符串搜索作为匹配的第一阶段的如果可能的话。看看的<一个href="http://referencesource.microsoft.com/#System/regex/system/text/regularex$p$pssions/RegexBoyerMoore.cs"相对=nofollow>来源$ C ​​$ C ,在血淋淋的细节。

The optimization implemented in the .NET regex implementation is a Boyer-Moore string search as the first phase of matching when possible. Take a look at the source code for the gory details.

从code本身:

// The RegexBoyerMoore object precomputes the Boyer-Moore
// tables for fast string scanning. These tables allow
// you to scan for the first occurance of a string within
// a large body of text without examining every character.
// The performance of the heuristic depends on the actual
// string and the text being searched, but usually, the longer
// the string that is being searched for, the fewer characters
// need to be examined.

这需要一个固定的 preFIX ,这是搜索由<一个href="http://referencesource.microsoft.com/#System/regex/system/text/regularex$p$pssions/RegexFCD.cs,5559c2458a83d8ee,references"相对=nofollow>这个功能,其评论说:

This requires an anchoring prefix, which is searched for by this function, whose comment says:

/*
 * This is the one of the only two functions that should be called from outside.
 * It takes a RegexTree and computes the set of chars that can start it.
 */

匹配算法包含code它返回的不敌的结果立即如果输入字符串比计算preFIX短。

The matching algorithm contains code which returns a no match result immediately if the input string is shorter than the computed prefix.

请注意,这也是在寻找的的优化,当然这些。

Note that it's also looking for anchors and optimizing for these, of course.

我没有找到一个的最小长度的在code优化,但我承认我没有仔细阅读<子>(必须这样做,有一天)。但我知道其他的正则表达式的实现,它们做这样的优化(PCRE想到)。无论如何,.NET实现有自己的优化的事情来说,你应该依赖于这一点。

I did not find a minimum length optimization in the code, but I admit I didn't read it thoroughly (gotta do that one day). But I know other regex implementations which do this kind of optimization (PCRE comes to mind). Anyway, the .NET implementation has its own way of optimizing things, you should rely on that.

这篇关于是否经常EX pression引擎跳过,比模式更短的字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆