“垂直"ASCII“图像"中的正则表达式匹配 [英] "vertical" regex matching in an ASCII "image"

查看:26
本文介绍了“垂直"ASCII“图像"中的正则表达式匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

注意:这是一个关于现代正则表达式的可能性的问题.这不是使用其他方法解决此问题的最佳方法.它的灵感来自一个较早的问题,但该问题不限于正则表达式.em>

Note: This is a question about possibilities of modern regex flavors. It's not about the best way to solve this using other methods. It's inspired by an earlier question, but that one is not restricted to regex.

在 ASCII图像"/art/map/string 中,例如:

In an ASCII "image"/art/map/string like:

....X.......
..X..X...X....
X.X...X..X.....
X....XXXXXX.....
X..XXX...........
.....X..........
..............X
..X...........X....
..X...........X....X...
....X.....

我想找到一个简单的由三个 X 组成的垂直线:

I'd like to find a simple vertical line formation of three Xs:

X
X
X

图像中的行数是可变的,每行的宽度也是可变的.

The number of lines is variable in the image, and the width of each line is variable too.

使用正则表达式(PCRE/PHP、Perl、.NET 或类似)是否可​​以:

With regex (PCRE/PHP, Perl, .NET or similar) is it possible to:

  1. 确定这种结构是否存在
  2. 计算这些阵型的数量/匹配所有阵型的起点(上例中为 4 个)

推荐答案

回答问题 1

回答第一个问题可以使用:

Answer to question 1

To answer the first question one could use:

(?xm)                    # ignore comments and whitespace, ^ matches beginning of line
^                        # beginning of line
(?:
    .                    # any character except 

    (?=                  # lookahead
        .*+
            # go to next line
        ( 1?+ . )       # add a character to the 1st capturing group
        .*+
            # next line
        ( 2?+ . )       # add a character to the 2nd capturing group
    )
)*?                      # repeat as few times as needed
X .*+
                  # X on the first line and advance to next line
1?+                     # if 1st capturing group is defined, use it, consuming exactly the same number of characters as on the first line
X .*+
                  # X on the 2nd line and advance to next line
2?+                     # if 2st capturing group is defined, use it, consuming exactly the same number of characters as on the first line
X                        # X on the 3rd line

在线演示

此表达式适用于 Perl、PCRE、Java,并且应该适用于 .NET.

This expression works in Perl, PCRE, Java and should work in .NET.

该表达式使用带有自引用捕获组的前瞻来为前瞻的每次重复添加一个字符(这用于计数").

The expression uses lookaheads with self referencing capturing groups to add a character for every repetition of the lookahead (this is used to "count").

1?+ 表示如果 1 匹配(或被定义)消费它,并且不返回(不回溯).在这种情况下,它等效于 (?(1) 1 ).这意味着如果定义了 1,则匹配 1.

1?+ means if 1 matches (or is defined) consume it, and don't give it back (don't backtrack). In this case it's equivalent to (?(1) 1 ). Which means match 1 if 1 is defined.

polygenelubricants他对我们如何将 ^nb^n 与 Java 正则表达式匹配?一>.(他还撰写了有关 Java 正则表达式的其他令人印象深刻的技巧,包括反向引用和环视.)

polygenelubricants explains this kinds of lookaheads with backreferences very nicely in his answer for How can we match a^n b^n with Java regex?. (He has also written about other impressive tricks for Java regex involving backreferences and lookarounds.)

当仅使用匹配并要求匹配数量中的答案(计数)时,问题 2 的答案将是:

When just using matching and requiring the answer (count) in the number of matches, then the question 2 answer would be:

它可以不能在后视有限的正则表达式中直接解决.而其他风格,如 Java 和 .NET 可以(例如在 m.buettner 的 .NET 解决方案中).

It can not be directly solved in regex flavors that have a limited lookbehind. While other flavors like Java and .NET could (as for example in m.buettner's .NET solution).

因此在这种情况下,Perl 和 PCRE(PHP 等)中的普通正则表达式匹配不能直接回答这个问题.

Thus plain regex matches in Perl and PCRE (PHP, etc) cannot directly answer this question in this case.

假设没有可变长度的lookbehinds可用.

Assume that no variable length lookbehinds are available.

您必须以某种方式计算 X 之前一行中的字符数.
唯一的方法是匹配它们,并且由于没有可变长度的lookbehinds可用,因此您必须(至少)在行的开头开始匹配.
如果您在一行的开头开始匹配,则每行最多只能获得一个匹配.

You have to in some way count the number of characters on a line before an X.
Only way to do that is to match them, and since no variable length lookbehinds are available you have to start the match (at least) at the beginning of the line.
If you start the match at the beginning of a line you can only get at most one match per line.

由于每行可能出现多次,因此这不会将它们全部计算在内,也不会给出正确答案.

Since there can be multiple occurrences per line, this would not count them all and would not give a correct answer.

另一方面,如果我们接受答案为匹配或替换结果的长度,那么第二个问题可以在 PCRE 和 Perl(以及其他风格)中回答.

On the other hand if we accept the answer as the length of a match or substitution result, then the 2nd question can be answered in PCRE and Perl (and other flavors).

此解决方案基于/启发于 m.buettner 出色的部分 PCRE 解决方案".

This solution is based on/inspired by m.buettner's nice "partial PCRE solution".

可以简单地用 $3 替换以下表达式的所有匹配项,得到问题二的答案(感兴趣模式的数量)作为结果字符串的长度.

One could simply replace all matches of the following expression with $3, getting the answer to question two (the number of patterns of interests) as the length of the resulting string.

^
(?:
    (?:                   # match .+? characters
        .
        (?=               # counting the same number on the following two lines
            .*+

            ( 1?+ . )
            .*+

            ( 2?+ . )
        )
    )+?
    (?<= X )              # till the above consumes an X
    (?=                   # that matches the following conditions
        .*+

        1?+
        (?<= X )
        .*+

        2?+
        (?<= X )
    )
    (?=                   # count the number of matches
        .*+

        ( 3?+ . )        # the number of matches = length of $3
    )
)*                        # repeat as long as there are matches on this line
.*
?                     # remove the rest of the line

在 Perl 中可以写成:

Which in Perl could be written as:

$in =~ s/regex/$3/gmx;
$count = length $in;

在线演示

这个表达式类似于上面问题 1 的解决方案,做了一些修改,在第一次前瞻中匹配的字符中包含 X,用量词包裹并计算量词的匹配数.

This expression is similar to the solution to question 1 above, with some modifications to include X in the characters matched in the first lookahead, wrapped with a quantifier and counting number of matches of the quantifier.

除了直接匹配之外,这是尽可能接近的(除了正则表达式之外的额外代码),并且可能是问题 2 的可接受答案.

Except for direct matches this is as close as it gets (extra code wise besides regex), and could be an acceptable answer to question 2.

上述解决方案的一些测试用例和结果.结果显示数字答案(结果字符串的长度),括号中是替换后的结果字符串.

Some test cases and results for the above solution. Result showing the numerical answer (length of the resulting string) and in parenthesis the resulting string after the substitution(s).

Test #0:
--------------------
X
X
X

result: 1 (X)


Test #1:
--------------------
..X....
..X....
..X....

result: 1 (.)


Test #2:
--------------------
..X.X..
..X.X..
....X..

result: 1 (.)


Test #3:
--------------------
..X....
..X....
...X...

result: 0 ()


Test #4:
--------------------
..X....
...X...
..X....

result: 0 ()


Test #5:
--------------------
....X..
.X..X..
.X.....

result: 0 ()


Test #6:
--------------------
.X..X..
.X.X...
.X.X...

result: 1 (.)


Test #7:
--------------------
.X..X..
.X..X..
.X..X..

result: 2 (.X)


Test #8:
--------------------
XXX
XXX
XXX

result: 3 (XXX)


Test #9:
--------------------
X.X.X
XXXXX
XXXXX
.X.X.

result: 5 (XXXXX)


Test #10:
--------------------
1....X.......
2..X..X...X....
3X.X...X..X.....
4X....XXXXXX.....
5X..XXX...........
6.....X..........
7.........X....X
8..X......X....X....
9..X......X....X....X...
A....X.....
B.X..X..
C.....
XXX
XXX
XXX
.

result: 8 (3458.XXX)

这篇关于“垂直"ASCII“图像"中的正则表达式匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆