重写与正则表达式IsHexString方法 [英] Rewrite IsHexString method with RegEx
问题描述
我是检查如果一个字符串是 有效的十六进制字符串的方法:
I got a method that checks if a string is a valid hex string:
public bool IsHex(string value)
{
if (string.IsNullOrEmpty(value) || value.Length % 2 != 0)
return false;
return
value.Substring(0, 2) == "0x" &&
value.Substring(2)
.All(c => (c >= '0' && c <= '9') ||
(c >= 'a' && c <= 'f') ||
(c >= 'A' && c <= 'F'));
}
的规则如下:
的偶数EX pression必须由的十六进制数字( 0-9,AF,AF )。
字符 0X必须是前两个字符在EX pression。
The rules are:
The expression must be composed of an even number of hexadecimal digits (0-9, A-F, a-f).
The characters 0x must be the first two characters in the expression.
我敢肯定,这可能是 rewriten在正则表达式在一个更清洁,更有效的方式。
你能帮助我吗?
I'm sure it can be rewriten in regex in a much cleaner and more efficient way.
Could you help me out with that?
推荐答案
在更新你的问题,新的正则表达式,为你工作应该是:
After you updated your question, the new regex that works for you should be:
^0x(?:[0-9A-Fa-f]{2})+$
如果我用(?:
的非捕获分组内的效率 {2}
意味着你想要两个的previous EX pression(即两个十六进制字符),您需要一个或多个十六进制字符 +
表示,需要注意的是该禁止 0X
作为有效值。
Where I use (?:
for non-capturing grouping for efficiency. The {2}
means that you want two of the previous expression (i.e., two hex chars), the +
means you want one or more hex characters. Note that this disallows 0x
as a valid value.
奥德提到一些有关效率。我不知道你的要求,所以我认为这更多的锻炼头脑比什么都重要。一个正则表达式将跨越只要最小的匹配正则表达式。例如,尽我自己的正则表达式大小10000可变的输入字符串50-5000字符,正确的,它运行1.1秒。
"Oded" mentioned something about efficiency. I don't know your requirements, so I consider this more an exercise for the mind than anything else. A regex will make leaps as long as the smallest matching regex. For instance, trying my own regex on 10,000 variable input strings of size 50-5000 characters, all correct, it runs in 1.1 seconds.
当我尝试以下正则表达式:
When I try the following regex:
^0x(?:[0-9A-Fa-f]{32})+(?:[0-9A-Fa-f]{2})+$
它的运行速度约40%,0.67秒。但要小心。了解你的输入是知道如何编写高效的正则表达式。举例来说,如果正则表达式失败,它将做很多的回溯。如果我的一半的输入字符串具有不正确的长度,运行时间爆炸至约34秒,或3000%(!),对于相同的输入。
it runs about 40% faster, in 0.67 seconds. But be careful. Knowing your input is knowing how to write efficient regexes. For instance, if the regex fails, it will do a lot of back-tracking. If half of my input strings has the incorrect length, the running time explodes to approx 34 seconds, or 3000% (!), for the same input.
如果大多数输入字符串是大它变得更加棘手。如果99%的输入的有效长度,都是> 4130字符,只有少数不是,写
It becomes even trickier if most input strings are large. If 99% of your input is of valid length, all are > 4130 chars and only a few are not, writing
^0x(?:[0-9A-Fa-f]{4096})+^0x(?:[0-9A-Fa-f]{32})+(?:[0-9A-Fa-f]{2})+$
是有效的,提高的时间甚至更多。但是,如果许多不正确的长度%2 = 0
,这是反效率的,因为后面跟踪。
is efficient and boosts time even more. However, if many have incorrect length % 2 = 0
, this is counter-efficient because of back-tracking.
最后,如果大多数串满足偶数的规则,仅一些或许多字符串包含一个错误的字符,速度上升:所述多个输入,它包含一个错误的字符,性能就越好。也就是说,因为当它发现无效字符它可以立即爆发。
Finally, if most your strings satisfy the even-number-rule, and only some or many strings contain a wrong character, the speed goes up: the more input that contains a wrong character, the better the performance. That is, because when it finds an invalid character it can immediately break out.
结论:如果你输入的是混合性小,大,错字,错了算你最快的方法是使用检查字符串的长度(即时在.NET)的组合,并使用有效的正则表达式
Conclusion: if your input is mixed small, large, wrong character, wrong count your fastest approach would be to use a combination of checking the length of the string (instantaneous in .NET) and use an efficient regex.
这篇关于重写与正则表达式IsHexString方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!