正则表达式允许单词之间有空格 [英] Regular expression to allow spaces between words

查看:237
本文介绍了正则表达式允许单词之间有空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要一个防止符号并且只允许字母和数字的正则表达式.下面的正则表达式效果很好,但它不允许单词之间有空格.

^[a-zA-Z0-9_]*$

例如,当使用这个正则表达式HelloWorld"可以,但是Hello World"不匹配.

如何调整它以允许空格?

解决方案

tl;dr

只需在您的字符类中添加一个空格.

^[a-zA-Z0-9_ ]*$

 

<小时>

现在,如果你想严格...

以上不完全正确.由于 * 表示 零个或多个,它会匹配以下所有通常不会匹配的情况:

  • 一个空字符串,".
  • 一个完全由空格组成的字符串,      ".
  • 一个以空格开头和/或结尾的字符串,   Hello World  ".
  • 在单词之间包含多个空格的字符串,Hello   World".

最初我认为这些细节不值得深入研究,因为 OP 提出了一个如此基本的问题,似乎严格性不是问题.现在这个问题已经流行起来了,我想说......

...使用 @stema 的回答.

在我看来(不使用 \w)翻译成:

^[a-zA-Z0-9_]+( [a-zA-Z0-9_]+)*$

(无论如何请给@stema 点赞.)

关于这个(以及@stema 的)回答的一些注意事项:

  • 如果您希望在单词之间允许 多个 空格(例如,如果您希望允许出现意外的双空格,或者如果您正在处理从PDF),然后在空格后添加 + :

    ^\w+( +\w+)*$

  • 如果你想允许制表符和换行符(空白字符),那么用 \s+ 替换空格:

    ^\w+(\s+\w+)*$

    这里我建议默认使用 + 因为,例如,Windows 换行符由 两个 连续的空白字符组成,\r\n,因此您需要 + 来捕获两者.

还是不行?

检查您使用的是哪种正则表达式方言.*在 Java 等语言中,您必须转义反斜杠,ie \\w\\s.在旧的或更基本的语言和实用程序中,如 sed\w\s 没有定义,所以用字符类写出它们, eg [a-zA-Z0-9_][\f\n\p\r\t] 分别.>

 

<小时>

* 我知道这个问题被标记为 ,但基于 25,000 多个视图,我猜不仅仅是那些遇到这个问题的人.目前它是搜索词组在 google 上的第一次命中,正则表达式空格词.

I want a regular expression that prevents symbols and only allows letters and numbers. The regex below works great, but it doesn't allow for spaces between words.

^[a-zA-Z0-9_]*$

For example, when using this regular expression "HelloWorld" is fine, but "Hello World" does not match.

How can I tweak it to allow spaces?

解决方案

tl;dr

Just add a space in your character class.

^[a-zA-Z0-9_ ]*$

 


Now, if you want to be strict...

The above isn't exactly correct. Due to the fact that * means zero or more, it would match all of the following cases that one would not usually mean to match:

  • An empty string, "".
  • A string comprised entirely of spaces, "      ".
  • A string that leads and / or trails with spaces, "   Hello World  ".
  • A string that contains multiple spaces in between words, "Hello   World".

Originally I didn't think such details were worth going into, as OP was asking such a basic question that it seemed strictness wasn't a concern. Now that the question's gained some popularity however, I want to say...

...use @stema's answer.

Which, in my flavor (without using \w) translates to:

^[a-zA-Z0-9_]+( [a-zA-Z0-9_]+)*$

(Please upvote @stema regardless.)

Some things to note about this (and @stema's) answer:

  • If you want to allow multiple spaces between words (say, if you'd like to allow accidental double-spaces, or if you're working with copy-pasted text from a PDF), then add a + after the space:

    ^\w+( +\w+)*$
    

  • If you want to allow tabs and newlines (whitespace characters), then replace the space with a \s+:

    ^\w+(\s+\w+)*$
    

    Here I suggest the + by default because, for example, Windows linebreaks consist of two whitespace characters in sequence, \r\n, so you'll need the + to catch both.

Still not working?

Check what dialect of regular expressions you're using.* In languages like Java you'll have to escape your backslashes, i.e. \\w and \\s. In older or more basic languages and utilities, like sed, \w and \s aren't defined, so write them out with character classes, e.g. [a-zA-Z0-9_] and [\f\n\p\r\t], respectively.

 


* I know this question is tagged , but based on 25,000+ views, I'm guessing it's not only those folks who are coming across this question. Currently it's the first hit on google for the search phrase, regular expression space word.

这篇关于正则表达式允许单词之间有空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆