正则表达式允许单词之间有空格 [英] Regular expression to allow spaces between words
问题描述
我想要一个防止符号并且只允许字母和数字的正则表达式.下面的正则表达式效果很好,但它不允许单词之间有空格.
^[a-zA-Z0-9_]*$
例如,当使用这个正则表达式HelloWorld"可以,但是Hello World"不匹配.
如何调整它以允许空格?
tl;dr
只需在您的字符类中添加一个空格.
^[a-zA-Z0-9_ ]*$
<小时>
现在,如果你想严格...
以上不完全正确.由于 *
表示 零个或多个,它会匹配以下所有通常不会匹配的情况:
- 一个空字符串,".
- 一个完全由空格组成的字符串, ".
- 一个以空格开头和/或结尾的字符串, Hello World ".
- 在单词之间包含多个空格的字符串,Hello World".
最初我认为这些细节不值得深入研究,因为 OP 提出了一个如此基本的问题,似乎严格性不是问题.现在这个问题已经流行起来了,我想说......
...使用 @stema 的回答.
在我看来(不使用 \w
)翻译成:
^[a-zA-Z0-9_]+( [a-zA-Z0-9_]+)*$
(无论如何请给@stema 点赞.)
关于这个(以及@stema 的)回答的一些注意事项:
如果您希望在单词之间允许 多个 空格(例如,如果您希望允许出现意外的双空格,或者如果您正在处理从PDF),然后在空格后添加
+
:^\w+( +\w+)*$
如果你想允许制表符和换行符(空白字符),那么用
\s+
替换空格:^\w+(\s+\w+)*$
这里我建议默认使用
+
因为,例如,Windows 换行符由 两个 连续的空白字符组成,\r\n
,因此您需要+
来捕获两者.
还是不行?
检查您使用的是哪种正则表达式方言.*在 Java 等语言中,您必须转义反斜杠,ie \\w
和 \\s
.在旧的或更基本的语言和实用程序中,如 sed
、\w
和 \s
没有定义,所以用字符类写出它们, eg [a-zA-Z0-9_]
和 [\f\n\p\r\t]
分别.>
<小时>
* 我知道这个问题被标记为 vb.net,但基于 25,000 多个视图,我猜不仅仅是那些遇到这个问题的人.目前它是搜索词组在 google 上的第一次命中,正则表达式空格词.
I want a regular expression that prevents symbols and only allows letters and numbers. The regex below works great, but it doesn't allow for spaces between words.
^[a-zA-Z0-9_]*$
For example, when using this regular expression "HelloWorld" is fine, but "Hello World" does not match.
How can I tweak it to allow spaces?
tl;dr
Just add a space in your character class.
^[a-zA-Z0-9_ ]*$
Now, if you want to be strict...
The above isn't exactly correct. Due to the fact that *
means zero or more, it would match all of the following cases that one would not usually mean to match:
- An empty string, "".
- A string comprised entirely of spaces, " ".
- A string that leads and / or trails with spaces, " Hello World ".
- A string that contains multiple spaces in between words, "Hello World".
Originally I didn't think such details were worth going into, as OP was asking such a basic question that it seemed strictness wasn't a concern. Now that the question's gained some popularity however, I want to say...
...use @stema's answer.
Which, in my flavor (without using \w
) translates to:
^[a-zA-Z0-9_]+( [a-zA-Z0-9_]+)*$
(Please upvote @stema regardless.)
Some things to note about this (and @stema's) answer:
If you want to allow multiple spaces between words (say, if you'd like to allow accidental double-spaces, or if you're working with copy-pasted text from a PDF), then add a
+
after the space:^\w+( +\w+)*$
If you want to allow tabs and newlines (whitespace characters), then replace the space with a
\s+
:^\w+(\s+\w+)*$
Here I suggest the
+
by default because, for example, Windows linebreaks consist of two whitespace characters in sequence,\r\n
, so you'll need the+
to catch both.
Still not working?
Check what dialect of regular expressions you're using.* In languages like Java you'll have to escape your backslashes, i.e. \\w
and \\s
. In older or more basic languages and utilities, like sed
, \w
and \s
aren't defined, so write them out with character classes, e.g. [a-zA-Z0-9_]
and [\f\n\p\r\t]
, respectively.
* I know this question is tagged vb.net, but based on 25,000+ views, I'm guessing it's not only those folks who are coming across this question. Currently it's the first hit on google for the search phrase, regular expression space word.
这篇关于正则表达式允许单词之间有空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!