RegEx-复杂的Regex函数,忽略空格,仅取反某些字母 [英] RegEx - Complex Regex Function, Ignore Spaces, negate only certain letters

查看:245
本文介绍了RegEx-复杂的Regex函数,忽略空格,仅取反某些字母的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,

我正在编写一个非常具体且复杂的RegEx函数,但无法弄清楚如何完成其​​中的某些部分,对此我有多个问题:

我要搜索的字符串与此相似: 39SWB20002000 这是一个MGRS坐标.
可以用几种不同的方式写字符串: 39SWB20002000 39S WB 2000 2000 39S WB 20002000 等.

我对如何为以下参数编写regEx感到困惑:

前两位数字可以是数字01-60或1-60

第三位数字只能是字母C-Xc-x,而不能是字母IiOo

第4位和第5位数字可以是字母A-Za-z,但不能是字母IiOo,与上面的数字相同

坐标的最后部分是两对数字(在上面的示例中为2000& 2000),可以用几种不同的方式编写它们,每个数字可以是1位数到6位数的数字,但是它们必须相同位数.

这是我到目前为止拥有的RegEx:

Hello everyone,

I am writing a very specific and complex RegEx function and I can''t figure out how to complete certain parts of it, I have more than one question on it:

The string I am searching for is similar to this one: 39SWB20002000 it is an MGRS Coordinate.
The string can be written in a few different ways 39SWB20002000, 39S WB 2000 2000, 39S WB 20002000 etc.

I am confused on how to write regEx for the following parameters:

The first two digits can be a number 01-60 or 1-60

The third digit can only be a letter C-Xc-x but not the letters IiOo

The 4th and 5th digits can be a letter A-Za-z but not the letters IiOo, same as above

The last portion of the coordinate is two pairs of numbers (2000 & 2000 in the example above), they can be written is several different ways, each number can be a number 1 digit to 6 digits but they both have to be the same amount of digits.

Here is the RegEx I have so far:

[0-6][0-9][C-HJ-NP-Xc-hj-np-x][A-HJ-NP-Za-hj-np-z]{2}



最好的方法是什么?

-Kyle



What is the best way to do this?

-Kyle

推荐答案

我会将整个内容拆分为扫描和解析.

根据您的规范,总体模式如下:
I would split the whole thing into scanning and parsing.

According to your spec, the overall pattern looks as follows:
begin, 1-to-2-digits, 1-char, 2-chars, 2-evenly-split-digit-groups-of-up-to-4-digits-each, end


所有令牌之间的位置可能有零个或多个空格.

让我们通过让每个令牌代表一个regex组来定义每个令牌和相应的正则表达式,然后按组解析这些令牌:


Where between all tokens, there may be zero or more spaces.

Lets define each token and the respective regex by having each token represent a regex group and parse then the tokens by group:

string input = ...;
...
string[] tokens =
{ @"(\d\d?)"         // group 1
, @"([a-zA-Z])"      // group 2
, @"([a-zA-Z]{2})"   // group 3
, @"(?:(\d{4})\s*(\d{4})|(\d{3})\s*(\d{3})|(\d{2})\s*(\d{2})|(\d{1})\s*(\d{1}))" // groups  4/5, 6/7, 8/9, 10/11
};
string pattern = @"^\s*" + string.Join(@"\s*", tokens) + @"\s*


" ; 匹配match = Regex.Match(input,pattern); 如果(!match.Success)错误(" ); int n = int .Parse(march.Groups [ 1 ].Value); 如果(n > 60 )错误(" ); 字符串 a =匹配.Groups[ 2 ].Value); 如果(Regex.IsMatch(a, @" ))错误( ..."); 字符串 b =匹配.Groups[ 3 ].Value); 如果(Regex.IsMatch(b, @" ))错误( ..."); int u = int .Parse(match.Groups [ 4 ].成功 ? match.Groups [ 4 ].Value :match.Groups [ 6 ].成功 ? match.Groups [ 6 ].Value :match.Groups [ 8 ].成功 ? match.Groups [ 8 ].Value :match.Groups [ 10 ].Value); int v = int .Parse(match.Groups [ 5 ].成功 ? match.Groups [ 5 ].Value :match.Groups [ 7 ].成功 ? match.Groups [ 7 ].Value :match.Groups [ 9 ].成功 ? match.Groups [ 9 ].Value :match.Groups [ 11 ].Value); ...
"; Match match = Regex.Match(input, pattern); if (!match.Success) Error("..."); int n = int.Parse(march.Groups[1].Value); if (n > 60) Error("..."); string a = match.Groups[2].Value); if (Regex.IsMatch(a, @"[abioyzABIOYZ]")) Error("..."); string b = match.Groups[3].Value); if (Regex.IsMatch(b, @"[ioIO]")) Error("..."); int u = int.Parse(match.Groups[4].Success ? match.Groups[4].Value : match.Groups[6].Success ? match.Groups[6].Value : match.Groups[8].Success ? match.Groups[8].Value : match.Groups[10].Value); int v = int.Parse(match.Groups[5].Success ? match.Groups[5].Value : match.Groups[7].Success ? match.Groups[7].Value : match.Groups[9].Success ? match.Groups[9].Value : match.Groups[11].Value); ...


干杯
Andi


Cheers
Andi


这是我的最终作品:
This is my final:
[^\d]\s*([6][0]|[1-5][0-9]|[0]*[1-9])\s*[A-HJ-NP-Xc-hj-np-z]\s*[A-HJ-NP-Za-hj-np-z]{2}\s*([\d]{8}\s+[\d]{8}|[\d]{7}\s+[\d]{7}|[\d]{6}\s+[\d]{6}|[\d]{5}\s+[\d]{5}|[\d]{4}\s+[\d]{4}|[\d]{3}\s+[\d]{3}|[\d]{2}\s+[\d]{2}|[\d]{1}\s+[\d]{1}|\d{16}|\d{14}|\d{12}|\d{10}|\d{8}|\d{6}|\d{4}|\d{2})\s*[^\d]


它很长,但是效果很好,它可以解析文档并选择坐标,而无需执行很多额外的编码.

这是它的细目分类,如果无法找到要搜索的10-59,则表达式首先搜索"60",然后搜索01-09或1-9:


It''s lengthy but it works rather well, it will parse through documents and pick out coordinates without having to do very much extra coding.

Here''s the breakdown of it, the Expression first searches for ''60'' if it can''t find that it will search for 10-59 then it will search through 01-09 or 1-9:

[^\d]\s*
(
    [6][0]|
    [1-5][0-9]|
    [0]*[1-9]
)


然后,正则表达式将搜索三个字母(但不是I& O),并在第一个字母和第二个字母之间留一个空格:


The regex then searches for Three letters (but not I & O), with a possible space between the first and second letter:

\s*
[A-HJ-NP-Xc-hj-np-z]
\s*
[A-HJ-NP-Za-hj-np-z]{2}
\s*


然后,它在末尾搜索网格号,它们必须是2到16的偶数位组,或者是两组数字,中间是空格.因此,我开始搜索,以查找介于两者之间的可能空间(首先是最大的数字),然后最后搜索偶数.


Then it searches for the grid numbers at the end, they need to be either a group of even digits 2 through 16 or two groups of digits with a space in between. So I start the search looking for a possible space in between (largest numbers first) then search for the even digits last.

(
    [\d]{8}\s+[\d]{8}|
    [\d]{7}\s+[\d]{7}|
    [\d]{6}\s+[\d]{6}|
    [\d]{5}\s+[\d]{5}|
    [\d]{4}\s+[\d]{4}|
    [\d]{3}\s+[\d]{3}|
    [\d]{2}\s+[\d]{2}|
    [\d]{1}\s+[\d]{1}|
    \d{16}|
    \d{14}|
    \d{12}|
    \d{10}|
    \d{8}|
    \d{6}|
    \d{4}|
    \d{2}
)
\s*[^\d]


-最后,将其用[^ \ d] \ s * ... \ s * [^ \ d]包围,以搜索围绕它的非数字和它们之间的可选空格.如果MGRS被更多的数字包围,则它不太可能是MGRS.
现在,我要做的就是从开始和结束处修剪最后一个字符.

-凯尔(Kyle)


-and finally it is surrounded with [^\d]\s* ... \s*[^\d] to search for non-digits surrounding it and optional space in between. If the MGRS is surrounded by more numbers it is not likely an MGRS.
Now all I have to do is trim the last chars off the beginning and end.

-Kyle


这篇关于RegEx-复杂的Regex函数,忽略空格,仅取反某些字母的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆