"竖直QUOT;正则表达式匹配在ASCII和QUOT;图像" [英] "vertical" regex matching in an ASCII "image"

查看:146
本文介绍了"竖直QUOT;正则表达式匹配在ASCII和QUOT;图像"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

注:这是一个关于现代化的regex口味的可能性问题。这不是用其他的方法来解决这个问题的最好办法。它是由较早问题的,但是,一个是不局限于正则表达式。

问题

在一个ASCII图像/艺术/图/字符串,如:

  .... X .......
..X..X ... X ....
X.X ... X..X .....
X .... XXXXXX .....
X..XXX ...........
..... X ..........
.............. X
..X ........... X ....
..X ........... X .... X ...
.... X .....
 

我想找到一个简单的垂直线形成的三个 X S:

  X
X
X
 

图像中的

行的数目是可变的,和的宽度的每个的线是可变的太

的问题(S)

使用正则表达式(PCRE / PHP,Perl和.NET或类似)是有可能:

  1. 确定这样一个编队的存在
  2. 计数这类地层的数/匹配它们所有的(在上述实施例4)的起点
解决方案

回答问题1

要回答一个可以使用的第一个问题:

 (?XM)#忽略注释和空白,^比赛开始行
^#开始行
(?:
    。 #除了\ñ任意字符
    (?=#前瞻
        * + \ N#去下一行
        (\ 1 +)#添加一个字符到第一个捕获组
        * + \ N#下一行
        (\ 2 +)#添加一个字符到第2个捕获组
    )
)*? #根据需要重复的几十倍
的X. * + \ N#X月的第一行和前进到下一行
\ 1 +#如果第一捕获组被定义,使用它,耗时的字符相同数量作为第一行上的
的X. * + \ N#X月2号线和前进到下一行
\ 2 +#如果位2st捕获组被定义,使用它,耗时的字符相同数量作为第一行上的
3号线X#x的
 

<大骨节病> 在线演示

这EX pression工作在Perl,PCRE,Java和应该在.NET。

这位前pression使用向前看符号与自参照捕获组添加字符超前的每一个重复(这是用来统计)。

\ 1 + 意味着,如果 \ 1 匹配(或定义)消耗它,也不要还给我(不走回头路)。在这种情况下,它等同于(?(1)\ 1)。这意味着匹配 \ 1 如果 \ 1 定义。

polygenelubricants 中的他对如何才能匹配^ NB ^ N与Java正则表达式的答案?。 (他还写了其他IM pressive技巧Java的正则表达式涉及反向引用和lookarounds。)

回答问题2

平原匹配

在仅仅使用匹配,并要求在比赛中的一些答案(计数),那么问题2的答案是:

不可以直接解决在具有有限的回顾后发正则表达式的味道。而其他的口味,如Java和.NET可以(例如在 m.buettner的.NET解决方案)。

因此​​,普通的正则表达式在Perl和PCRE(PHP等)匹配不能直接回答在这种情况下这个问题。

(半?)证明

假设没有变长lookbehinds可用。

您必须以某种方式之前, X 指望一个行的字符数。
只有这样,才能做到这一点是与它们匹配,并且由于没有可变长度lookbehinds可就得开始比赛(至少)在该行的开头。
如果你开始的比赛在一行的开头,你只能得到每行最多一个匹配。

由于可以每行多次出现,这将不计所有这些,也没有给出一个正确的答案。

长/间接解决方案

在另一方面,如果我们接受了答案的匹配或替换结果的长度,那么第二个问题的可以回答在PCRE和Perl(和其他口味)。

该解决方案是基于/启发 m.buettner真好部分PCRE解决方案

可以简单地取代下列前pression所有的比赛用 $ 3 ,得到的回答第二个问题作为长度(的利益模式的数量)结果字符串的。

  ^
(?:
    (?:?#匹配+字符
        。
        (?=#寄望于以下两行相同的数
            * + \ñ
            (\ 1 +)。
            * + \ñ
            (\ 2 +)。
        )
    )+&
    (小于4 = X)#,直到上述消耗的X
    (?=#匹配下列条件
        * + \ñ
        \ 1 +
        (小于4 = X)
        * + \ñ
        \ 2 +
        (小于4 = X)
    )
    (?=#伯爵匹配的数量
        * + \ñ
        (\ 3 +)匹配#数量=的$ 3长度
    )
)*#重复只要有这条线上比赛
* \ñ? #删除行的其余部分
 

这在Perl可以写成:

  $在=〜S /正则表达式/ $ 3 / GMX;
$计数=长度$;
 

<大骨节病> 在线演示

此前pression类似于溶液对问题1的上方,具有一些修改,以包括 X 在匹配在第一先行的字符,包裹着量词的比赛量词和计数值。

除了直接匹配,这是因为接近,因为它得到(额外的code除了正则表达式的明智),并可能是一个可以接受的答案问题2。

测试用例

一些测试用例和结果上述溶液。结果显示数值答案(结果字符串的长度),并在括号中的结果字符串替代(S)之后。

 测试#0:
--------------------
X
X
X

结果:1(X)


测试#1:
--------------------
..X ....
..X ....
..X ....

结果:1(。)


测试2:
--------------------
..X.X ..
..X.X ..
.... X ..

结果:1(。)


测试#3:
--------------------
..X ....
..X ....
... X ...

结果:0()


测试4:
--------------------
..X ....
... X ...
..X ....

结果:0()


试验#5:
--------------------
.... X ..
.X..X ..
.X .....

结果:0()


测试6:
--------------------
.X..X ..
.X.X ...
.X.X ...

结果:1(。)


测试7:
--------------------
.X..X ..
.X..X ..
.X..X ..

结果:2(.X)


测试#8:
--------------------
XXX
XXX
XXX

结果:3(XXX)


测试#9:
--------------------
X.X.X
XXXXX
XXXXX
.X.X。

结果:5(XXXXX)


试验#10:
--------------------
1 .... X .......
2..X..X ... X ....
3X.X ... X..X .....
4X .... XXXXXX .....
5X..XXX ...........
6 ..... X ..........
7 ......... X .... X
8..X ...... X .... X ....
9..X ...... X .... X .... X ...
斧头.....
B.X..X ..
C.....
XXX
XXX
XXX
。

结果:8(3458.XXX)
 

Note: This is a question about possibilities of modern regex flavors. It's not about the best way to solve this using other methods. It's inspired by an earlier question, but that one is not restricted to regex.

The Problem

In an ASCII "image"/art/map/string like:

....X.......
..X..X...X....
X.X...X..X.....
X....XXXXXX.....
X..XXX...........
.....X..........
..............X
..X...........X....
..X...........X....X...
....X.....

I'd like to find a simple vertical line formation of three Xs:

X
X
X

The number of lines is variable in the image, and the width of each line is variable too.

The Question(s)

With regex (PCRE/PHP, Perl, .NET or similar) is it possible to:

  1. Determine if such a formation exists
  2. Count the number of such formations/match the starting point of them all (4 in the above example)

解决方案

Answer to question 1

To answer the first question one could use:

(?xm)                    # ignore comments and whitespace, ^ matches beginning of line
^                        # beginning of line
(?:
    .                    # any character except \n
    (?=                  # lookahead
        .*+\n            # go to next line
        ( \1?+ . )       # add a character to the 1st capturing group
        .*+\n            # next line
        ( \2?+ . )       # add a character to the 2nd capturing group
    )
)*?                      # repeat as few times as needed
X .*+\n                  # X on the first line and advance to next line
\1?+                     # if 1st capturing group is defined, use it, consuming exactly the same number of characters as on the first line
X .*+\n                  # X on the 2nd line and advance to next line
\2?+                     # if 2st capturing group is defined, use it, consuming exactly the same number of characters as on the first line
X                        # X on the 3rd line

Online demo

This expression works in Perl, PCRE, Java and should work in .NET.

The expression uses lookaheads with self referencing capturing groups to add a character for every repetition of the lookahead (this is used to "count").

\1?+ means if \1 matches (or is defined) consume it, and don't give it back (don't backtrack). In this case it's equivalent to (?(1) \1 ). Which means match \1 if \1 is defined.

polygenelubricants explains this kinds of lookaheads with backreferences very nicely in his answer for How can we match a^n b^n with Java regex?. (He has also written about other impressive tricks for Java regex involving backreferences and lookarounds.)

Answer to question 2

Plain matching

When just using matching and requiring the answer (count) in the number of matches, then the question 2 answer would be:

It can not be directly solved in regex flavors that have a limited lookbehind. While other flavors like Java and .NET could (as for example in m.buettner's .NET solution).

Thus plain regex matches in Perl and PCRE (PHP, etc) cannot directly answer this question in this case.

(Semi?)proof

Assume that no variable length lookbehinds are available.

You have to in some way count the number of characters on a line before an X.
Only way to do that is to match them, and since no variable length lookbehinds are available you have to start the match (at least) at the beginning of the line.
If you start the match at the beginning of a line you can only get at most one match per line.

Since there can be multiple occurrences per line, this would not count them all and would not give a correct answer.

Length/indirect solution

On the other hand if we accept the answer as the length of a match or substitution result, then the 2nd question can be answered in PCRE and Perl (and other flavors).

This solution is based on/inspired by m.buettner's nice "partial PCRE solution".

One could simply replace all matches of the following expression with $3, getting the answer to question two (the number of patterns of interests) as the length of the resulting string.

^
(?:
    (?:                   # match .+? characters
        .
        (?=               # counting the same number on the following two lines
            .*+\n
            ( \1?+ . )
            .*+\n
            ( \2?+ . )
        )
    )+?
    (?<= X )              # till the above consumes an X
    (?=                   # that matches the following conditions
        .*+\n
        \1?+
        (?<= X )
        .*+\n
        \2?+
        (?<= X )
    )
    (?=                   # count the number of matches
        .*+\n
        ( \3?+ . )        # the number of matches = length of $3
    )
)*                        # repeat as long as there are matches on this line
.*\n?                     # remove the rest of the line

Which in Perl could be written as:

$in =~ s/regex/$3/gmx;
$count = length $in;

Online demo

This expression is similar to the solution to question 1 above, with some modifications to include X in the characters matched in the first lookahead, wrapped with a quantifier and counting number of matches of the quantifier.

Except for direct matches this is as close as it gets (extra code wise besides regex), and could be an acceptable answer to question 2.

Test cases

Some test cases and results for the above solution. Result showing the numerical answer (length of the resulting string) and in parenthesis the resulting string after the the substitution(s).

Test #0:
--------------------
X
X
X

result: 1 (X)


Test #1:
--------------------
..X....
..X....
..X....

result: 1 (.)


Test #2:
--------------------
..X.X..
..X.X..
....X..

result: 1 (.)


Test #3:
--------------------
..X....
..X....
...X...

result: 0 ()


Test #4:
--------------------
..X....
...X...
..X....

result: 0 ()


Test #5:
--------------------
....X..
.X..X..
.X.....

result: 0 ()


Test #6:
--------------------
.X..X..
.X.X...
.X.X...

result: 1 (.)


Test #7:
--------------------
.X..X..
.X..X..
.X..X..

result: 2 (.X)


Test #8:
--------------------
XXX
XXX
XXX

result: 3 (XXX)


Test #9:
--------------------
X.X.X
XXXXX
XXXXX
.X.X.

result: 5 (XXXXX)


Test #10:
--------------------
1....X.......
2..X..X...X....
3X.X...X..X.....
4X....XXXXXX.....
5X..XXX...........
6.....X..........
7.........X....X
8..X......X....X....
9..X......X....X....X...
A....X.....
B.X..X..
C.....
XXX
XXX
XXX
.

result: 8 (3458.XXX)

这篇关于&QUOT;竖直QUOT;正则表达式匹配在ASCII和QUOT;图像&QUOT;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆