忽略正则表达式匹配的空白 [英] Ignoring white space for a Regex match

查看:120
本文介绍了忽略正则表达式匹配的空白的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要匹配8个或更多的数字,其顺序可以包含空格。

I need to match 8 or more digits, the sequence of which can include spaces.

例如,以下所有都是有效的匹配项。

for example, all of the below would be valid matches.

12345678
1 2345678
12 3 45678
1234 5678
12 34567 8
1 2 3 4 5 6 7 8

目前我有 \d {8,} ,但这只会捕获8个或更多数字的实心块。

[\d\s] { 8,} 无效,因为我不希望空格增加捕获的字符数。

At the moment I have \d{8,} but this will only capture a solid block of 8 or more digits.
[\d\s]{8,} will not work as I don't want white space to contribute to the count of chars captured.

推荐答案

稍后再说,但这确实需要正确的答案以及原因。谁知道这个问题会有这么复杂的答案,对吗?大声笑。

Waayy later, but this really needs the correct answer on it, and a reason why. Who knew this question could have such a complex answer, right? Lol. But there are plenty of considerations surrounding spacing in regex.

首先,在正则表达式中使用空格是有很多考虑的。切勿在正则表达式中放置空格。这样做将使您的正则表达式不可读且不可维护。使用鼠标突出显示一个空间以确保它只是一个空间的记忆。这将破坏您的正则表达式:    ;,但不会:[   ],因为将忽略字符类中的重复。而且,如果您需要精确的空格数,则实际上可以在类似这样的字符类中看到: [] {3} 。与没有此类字符类的事故相比:   {3}<-实际上是在寻找5个空格,,!

Firstly; Never put a space in a regex. Doing so will make your regex unreadable, and unmaintainable. Memories of using a mouse to highlight a space to ensure it was only one space comes to mind. This will break your regex:    , but this won't: [    ], because repetition in a character class is ignored. And if you require an exact number of spaces, you can actually see that in a character class like so: [ ]{3}. Versus accidents without the character class like so:   {3} <-- This is actually looking for 5 spaces, woops!

Second;请记住Freespacing (?x)选项,这使您的正则表达式可注释且可自由分配。您不必担心使用该选项的人可能会破坏您的正则表达式,因为您决定在其中放置随机的键盘空格。另外,当(?x)在诸如此类的字符类中时,将不会忽略键盘空间: [] 。因此,将字符类用于键盘空间会更安全。

Second; Keep the Freespacing (?x) option in mind, which makes your regex commentable and free-spaceable. You shouldn't fear that somebody using that option might break your regex because you decided to put random keyboard spaces in it. Also, (?x) will not ignore the keyboard space when it's inside a character class like so: [ ]. It is therefore safer to use character classes for your keyboard spaces.

第三;在这种情况下,请尽量不要使用 \s 。正如Omaghosh指出的那样,它还包括换行符( \r \n )。您提到的情况似乎并不那么支持。但是,正如Omaghosh指出的那样,您可能不仅需要键盘空间。因此,您可以使用 [] [\s- [\r\n]] [\f\t\v\u00A0\u2028\u2029\u0020] 取决于您的喜好。这些选项中的最后两个是同一回事,但是字符类减法仅在.NET和其他几种怪异的风格中起作用。

Third; Try not to use \s in this scenario. As Omaghosh points out, it also includes newlines (\r and \n). The scenario you mentioned wouldn't seem to favor that. However, also as Omaghosh points out, you may want more than just keyboard spaces. So you can use either [ ], [\s-[\r\n]], or [\f\t\v\u00A0\u2028\u2029\u0020] depending on what you fancy. The last two in those options are the same thing, but character class subtraction only works in .NET and a couple other weird flavors.

第四;这是一种通常过度构建的模式:(\s * ... \s *)* 。没有任何意义与此相同:(\s * \s * ...)* 或与此相同:(\s * \ s * \s * \s * ...)* 。因为图案是重复的。唯一要说的是,您必须保证在 ... 之前捕获空格。但是,这是真正想要的一次。最坏的情况是,您可能会看到以下内容: \s *(... \s *)*

Fourth; This is a commonly over-built pattern: (\s*...\s*)*. It doesn't make any sense. It is the same as this: (\s*\s*...)* or this: (\s*\s*\s*\s*...)*. Because the pattern is repeating. The only argument against what I'm saying is that you'd be guaranteed to capture the spaces prior to the .... But not once is that ever actually wanted. Worst-case scenario, you might see this: \s*(...\s*)*

Omaghosh的答案最接近,但这是最短的正确答案:

Omaghosh had the closest answer, but this is the shortest correct answer:

Regex.Match(input, @"(?:\d[ ]*){8,}").Groups[0].Value;

或者以下情况,如果我们逐字地认为六个选项在多个行:

Or the following, if we take the question literally that the six options are in the same text on multiple lines:

Regex.Match(input, @"(?m)^(?:\d[ ]*){8,}$").Groups[0].Value;

或者以下内容,如果它是较大正则表达式的一部分并且需要一个组:

Or the following, if it is part of a bigger regex and needs a group:

Regex.Match(input, @"...((?:\d[ ]*){8,})...").Groups[1].Value;

随意替换 [] NET类减法,或非.NET显式空白类:

And feel free to replace the [ ] with a .NET Class Subtraction, or a Non-.NET explicit whitespace class:

@"(?:\d[\s-[\r\n]]*){8,}"
// Or . . .
@"(?:\d[\f\t\v\u00A0\u2028\u2029\u0020]*){8,}"

这篇关于忽略正则表达式匹配的空白的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆