正则表达式允许字母数字,最大一个空格等 [英] Regular expression to allow alphanumeric, max one space etc

查看:319
本文介绍了正则表达式允许字母数字,最大一个空格等的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我打开这个线程真的和另一个类似,但是我找不到一个问题:
我有一个输入字段,允许一个带有可选唯一空格的字母数字字符串作为分隔符,然后是一个可选的其他字母数字字符串等....
我发现这个正则表达式:

  ^([0-9a-zA- z] +?)* $ 

它的工作原理!但是,一旦我在长句子中有两个连续的空格,那么这个表现真的很糟糕,那两个空格位于句子的远处。
在下面的例子中,如果我将2个空格放在句子的开头,结果就可以在二分之一秒的时间内完成。
但是如果位置远,则持续10秒以上。


dzdff5464zdiophjazdioj ttttttttt zoddzdffdziophjazdioj ttttttttt
zoddzdffdzdff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt
zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt
zoddzdfff ttttt zo999 ddzdfff ttttt zoddzdfff ttttt zoddzdff


2个空格位于 999 之后。
你有什么想法或建议来改进这个正则表达式?



感谢并且





ps:您可以在字符串中输入无效字符,而不是特别是2个空格,可以检查问题。



编辑:另一个例子:
12345678901234567890'==> 20 char。 + 1个无效字符。 => result is immediate
添加5个有效的char。它持续5秒钟执行正则表达式!
1234567890123456789012345'

解决方案

我建议将表达式更改为这样:



(?i)^ [0-9a-z] +(?: \s [0-9a-z] +)* $ / p>



这在功能上类似,它将匹配由单个空格分隔的所有字母数字字符。一个主要的区别是我将初始单词检查移动到表达式的前面,然后将非捕获组(?: ... 对于剩余的空格分隔词。



非捕获组(?: 比捕获组更快[code>( ... ) \s 移动到重复单词的单词组的前面,引擎不需要验证组中包含的第一个字符类别



您的角色类别中还有一个打字错误 [0-9a-zA-z] 最后一个 z 应该是大写。这个 A-z 格式可能会有一些奇怪的意想不到的结果。在我的表达式中,我刚刚添加了一个(?i),以强制正则表达式引擎转换为不区分大小写的模式,我将字符类放入



在我的测试中,我看到你的表达式 ^([0 -9a-z] +?)* $ 需要大约0.03秒来处理您的示例文本,最后有2个额外的空格。我推荐的表达式在约0.000022秒内完成相同的测试。 WOW是一个惊人的三角洲。


I'm opening this thread that is really similar to another one but I cannot figure out an issue : I have an input field that allow a alphanumeric string with an optional unique space as a separator, then an optional other alphanumeric string etc.... I find this regex :

^([0-9a-zA-z]+ ?)*$

It works ! But the performance is really bad as soon as I have 2 consecutives spaces in a long sentence and those 2 spaces are located far in the sentence. In the example below, the result is ok in a half of second if I put the 2 spaces at the beginning of the sentence. But it lasts 10 seconds or more if located far.

dzdff5464zdiophjazdioj ttttttttt zoddzdffdziophjazdioj ttttttttt zoddzdffdzdff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zoddzdfff ttttt zo999 ddzdfff ttttt zoddzdfff ttttt zoddzdff

The 2 spaces are after the 999. Do you have any idea or suggestion to improve this regex ?

Thanks and regards

PF

ps: you can check the issue as soon as you enter an invalid character far in the string, not specifically 2 spaces.

EDIT : another example : 12345678901234567890' ==> 20 char. + 1 invalid char. => result is immediate Add 5 valid char. and it lasts 5 seconds to perform the regex ! 1234567890123456789012345'

解决方案

I suggest changing the expression to something like this:

(?i)^[0-9a-z]+(?:\s[0-9a-z]+)*$

This is functionally similar in that it'll match all alphanumeric characters which are delimited by a single space. A major difference is that I moved the initial word check to the front of the expression, then made a non capture group (?:...) for the remaining space delimited words.

Non capture groups (?:...) are faster then capture groups (...) because the regex engine doesn't need to retain matched values. And by moving the space \s to the front of the word group on repeat words the engine doesn't need to validate the first character in the group is included in the character class.

You also have a typo in your character class [0-9a-zA-z] the last z should probably be upper case. This A-z format will likely have some odd unexpected results. In my expression I simply added a (?i) to the beginning to force the regex engine to shift into case insensitive mode, and I dropped the character class to [0-9a-z].

In my testing I see that your expression ^([0-9a-z]+ ?)*$ takes about 0.03 seconds to process your sample text with 2 extra spaces toward the end. My recommended expression completes the same test in about 0.000022 seconds. WOW that's an amazing delta.

这篇关于正则表达式允许字母数字,最大一个空格等的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆