重写与正则表达式IsHexString方法 [英] Rewrite IsHexString method with RegEx

查看:357
本文介绍了重写与正则表达式IsHexString方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是检查如果一个字符串是 有效的十六进制字符串的方法:

I got a method that checks if a string is a valid hex string:

public bool IsHex(string value)
{
  if (string.IsNullOrEmpty(value) || value.Length % 2 != 0)
    return false;

  return 
    value.Substring(0, 2) == "0x" &&
    value.Substring(2)
      .All(c => (c >= '0' && c <= '9') ||
                (c >= 'a' && c <= 'f') ||
                (c >= 'A' && c <= 'F'));
}

的规则如下:
偶数EX pression必须由的十六进制数字( 0-9,AF,AF )。
字符 0X必须是前两个字符在EX pression。

The rules are:
The expression must be composed of an even number of hexadecimal digits (0-9, A-F, a-f).
The characters 0x must be the first two characters in the expression.

我敢肯定,这可能是 rewriten在正则表达式在一个更清洁,更有效的方式。
你能帮助我吗?

I'm sure it can be rewriten in regex in a much cleaner and more efficient way.
Could you help me out with that?

推荐答案

在更新你的问题,新的正则表达式,为你工作应该是:

After you updated your question, the new regex that works for you should be:

^0x(?:[0-9A-Fa-f]{2})+$

如果我用(?:的非捕获分组内的效率 {2} 意味着你想要两个的previous EX pression(即两个十六进制字符),您需要一个或多个十六进制字符 + 表示,需要注意的是该禁止 0X 作为有效值。

Where I use (?: for non-capturing grouping for efficiency. The {2} means that you want two of the previous expression (i.e., two hex chars), the + means you want one or more hex characters. Note that this disallows 0x as a valid value.

奥德提到一些有关效率。我不知道你的要求,所以我认为这更多的锻炼头脑比什么都重要。一个正则表达式将跨越只要最小的匹配正则表达式。例如,尽我自己的正则表达式大小10000可变的输入字符串50-5000字符,正确的,它运行1.1秒。

"Oded" mentioned something about efficiency. I don't know your requirements, so I consider this more an exercise for the mind than anything else. A regex will make leaps as long as the smallest matching regex. For instance, trying my own regex on 10,000 variable input strings of size 50-5000 characters, all correct, it runs in 1.1 seconds.

当我尝试以下正则表达式:

When I try the following regex:

^0x(?:[0-9A-Fa-f]{32})+(?:[0-9A-Fa-f]{2})+$

它的运行速度约40%,0.67秒。但要小心。了解你的输入是知道如何编写高效的正则表达式。举例来说,如果正则表达式失败,它将做很多的回溯。如果我的一半的输入字符串具有不正确的长度,运行时间爆炸至约34秒,或3000%(!),对于相同的输入。

it runs about 40% faster, in 0.67 seconds. But be careful. Knowing your input is knowing how to write efficient regexes. For instance, if the regex fails, it will do a lot of back-tracking. If half of my input strings has the incorrect length, the running time explodes to approx 34 seconds, or 3000% (!), for the same input.

如果大多数输入字符串是大它变得更加棘手。如果99%的输入的有效长度,都是> 4130字符,只有少数不是,写

It becomes even trickier if most input strings are large. If 99% of your input is of valid length, all are > 4130 chars and only a few are not, writing

^0x(?:[0-9A-Fa-f]{4096})+^0x(?:[0-9A-Fa-f]{32})+(?:[0-9A-Fa-f]{2})+$

是有效的,提高的时间甚至更多。但是,如果许多不正确的长度%2 = 0 ,这是反效率的,因为后面跟踪。

is efficient and boosts time even more. However, if many have incorrect length % 2 = 0, this is counter-efficient because of back-tracking.

最后,如果大多数串满足偶数的规则,仅一些或许多字符串包含一个错误的字符,速度上升:所述多个输入,它包含一个错误的字符,性能就越好。也就是说,因为当它发现无效字符它可以立即爆发。

Finally, if most your strings satisfy the even-number-rule, and only some or many strings contain a wrong character, the speed goes up: the more input that contains a wrong character, the better the performance. That is, because when it finds an invalid character it can immediately break out.

结论:如果你输入的是混合性小,大,错字,错了算你最快的方法是使用检查字符串的长度(即时在.NET)的组合,并使用有效的正则表达式

Conclusion: if your input is mixed small, large, wrong character, wrong count your fastest approach would be to use a combination of checking the length of the string (instantaneous in .NET) and use an efficient regex.

这篇关于重写与正则表达式IsHexString方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆