为什么在某些风格的外观工作中没有有限的重复? [英] Why doesn't finite repetition in lookbehind work in some flavors?

查看:113
本文介绍了为什么在某些风格的外观工作中没有有限的重复?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从 dd / mm / yy 格式的日期解析中间的2位数字,但也允许日期和月份的单位数字。



这就是我想出的:

 (?< = ^ [ \] {1,2} \ /)[\ d] {1,2} 

我想要一个1位或2位数 [\d] {1,2} ,带有1或2位数字,斜线 ^ [ \d] {1,2} \ / 之前。



这对许多组合都不起作用,我测试过 10/10/10 11/12/13 等......



但令我惊讶的是(?< = ^ \\\\\\)/ [\ d] {1,2} 工作了。



[\d] {1,2} 如果也应匹配\\\\ 没错,或者我错了?

解决方案

关于后备支持

h2>

主要的正则表达式风格对于lookbehind有不同的支持;有些限制,有些甚至根本不支持。




  • Javascript:不支持

  • Python:仅限固定长度

  • Java:仅限有限长度

  • .NET:无限制



参考文献








Python



在Python中,只支持固定长度的lookbehind,原始模式会引发错误,因为 \d {1,2} 显然没有固定的长度。您可以通过在两个不同的固定长度的后视镜上交替来修复这个,例如,类似这样的事情:

 (?< = ^ \\\\ /)\\\ {1,2} | (?< = ^ \d\d\ /)\d {1,2} 

或许您可以将两个lookbehinds作为非捕获组的替代品:

 (?:(?< = ^ \d\ /)|(?< = ^ \\\\ /))\d {1,2} 

(请注意,您可以使用 \d 而不使用括号)。



也就是说,使用捕获组可能要简单得多:

  ^ \d { 1,2} \ /(\d {1,2})

请注意 findall 返回什么如果您只有一个组,则组1捕获。捕获组比后观更受支持,并且通常会导致更易读的模式(例如在这种情况下)。



此片段说明了以上所有要点:

  p = re.compile(r'(?:(?< = ^ \\\\)|(? < = ^ \d\d\ /))\\\ {1,2}')

print(p.findall(12/34/56))# [34]
print(p.findall(1/23/45))#[23]

p = re.compile(r'^ \d {1 ,2} \ /(\d {1,2})')

print(p.findall(12/34/56))#[34​​]
print(p.findall(1/23/45))#[23]

p = re.compile(r'(?< = ^ \d {1,2 } \ /)\d {1,2}')
#raise错误(look-behind需要固定宽度模式)



参考文献








Java



Java仅支持有限长度的lookbehind,因此您可以使用 \d { 1,2} 就像在原始模式中一样。以下代码段演示了这一点:

  String text = 
12/34/56 date\\\
+
1/23/45另一个日期\ n;

模式p = Pattern.compile((?m)(?< = ^ \\d {1,2} /)\\d {1,2}) ;
Matcher m = p.matcher(text);
while(m.find()){
System.out.println(m.group());
} //34,23

请注意(?m)是嵌入的 Pattern.MULTILINE ,以便 ^ 匹配每一行的开头。另请注意,由于 \ 是字符串文字的转义字符,因此必须将\\写入在Java中获得一个反斜杠。






C-Sharp



C#支持lookbehind的完整正则表达式。以下代码段显示了如何在lookbehind上使用 + 重复:

  var text = @
1/23/45
12/34/56
123/45/67
1234/56/78
;

正则表达式r =新正则表达式(@(?m)(?< = ^ \d + /)\d {1,2});
foreach(匹配m in r.Matches(text)){
Console.WriteLine(m);
} //23,34,45,56

请注意,与Java不同,在C#中,您可以使用 @引用字符串,这样您就不必转义 \



为了完整性,以下是您在C#中使用捕获组选项的方法:

 正则表达式r =新正则表达式(@(?m)^ \d + /(\d {1,2})); 
foreach(匹配m在r.Matches(文本)){
Console.WriteLine(Matched [+ m +]; month =+ m.Groups [1]);
}

鉴于之前的文字,打印:

 匹配[1/23];月= 23 
配对[12/34];月= 34
配对[123/45];月= 45
配对[1234/56];月= 56



相关问题




I want to parse the 2 digits in the middle from a date in dd/mm/yy format but also allowing single digits for day and month.

This is what I came up with:

(?<=^[\d]{1,2}\/)[\d]{1,2}

I want a 1 or 2 digit number [\d]{1,2} with a 1 or 2 digit number and slash ^[\d]{1,2}\/ before it.

This doesn't work on many combinations, I have tested 10/10/10, 11/12/13, etc...

But to my surprise (?<=^\d\d\/)[\d]{1,2} worked.

But the [\d]{1,2} should also match if \d\d did, or am I wrong?

解决方案

On lookbehind support

Major regex flavors have varying supports for lookbehind differently; some imposes certain restrictions, and some doesn't even support it at all.

  • Javascript: not supported
  • Python: fixed length only
  • Java: finite length only
  • .NET: no restriction

References


Python

In Python, where only fixed length lookbehind is supported, your original pattern raises an error because \d{1,2} obviously does not have a fixed length. You can "fix" this by alternating on two different fixed-length lookbehinds, e.g. something like this:

(?<=^\d\/)\d{1,2}|(?<=^\d\d\/)\d{1,2}

Or perhaps you can put both lookbehinds as alternates of a non-capturing group:

(?:(?<=^\d\/)|(?<=^\d\d\/))\d{1,2}

(note that you can just use \d without the brackets).

That said, it's probably much simpler to use a capturing group instead:

^\d{1,2}\/(\d{1,2})

Note that findall returns what group 1 captures if you only have one group. Capturing group is more widely supported than lookbehind, and often leads to a more readable pattern (such as in this case).

This snippet illustrates all of the above points:

p = re.compile(r'(?:(?<=^\d\/)|(?<=^\d\d\/))\d{1,2}')

print(p.findall("12/34/56"))   # "[34]"
print(p.findall("1/23/45"))    # "[23]"

p = re.compile(r'^\d{1,2}\/(\d{1,2})')

print(p.findall("12/34/56"))   # "[34]"
print(p.findall("1/23/45"))    # "[23]"

p = re.compile(r'(?<=^\d{1,2}\/)\d{1,2}')
# raise error("look-behind requires fixed-width pattern")

References


Java

Java supports only finite-length lookbehind, so you can use \d{1,2} like in the original pattern. This is demonstrated by the following snippet:

    String text =
        "12/34/56 date\n" +
        "1/23/45 another date\n";

    Pattern p = Pattern.compile("(?m)(?<=^\\d{1,2}/)\\d{1,2}");
    Matcher m = p.matcher(text);
    while (m.find()) {
        System.out.println(m.group());
    } // "34", "23"

Note that (?m) is the embedded Pattern.MULTILINE so that ^ matches the start of every line. Note also that since \ is an escape character for string literals, you must write "\\" to get one backslash in Java.


C-Sharp

C# supports full regex on lookbehind. The following snippet shows how you can use + repetition on a lookbehind:

var text = @"
1/23/45
12/34/56
123/45/67
1234/56/78
";

Regex r = new Regex(@"(?m)(?<=^\d+/)\d{1,2}");
foreach (Match m in r.Matches(text)) {
  Console.WriteLine(m);
} // "23", "34", "45", "56"

Note that unlike Java, in C# you can use @-quoted string so that you don't have to escape \.

For completeness, here's how you'd use the capturing group option in C#:

Regex r = new Regex(@"(?m)^\d+/(\d{1,2})");
foreach (Match m in r.Matches(text)) {
  Console.WriteLine("Matched [" + m + "]; month = " + m.Groups[1]);
}

Given the previous text, this prints:

Matched [1/23]; month = 23
Matched [12/34]; month = 34
Matched [123/45]; month = 45
Matched [1234/56]; month = 56

Related questions

这篇关于为什么在某些风格的外观工作中没有有限的重复?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆