grep -P查找正好包含n A的行,然后正好包含n B的行 [英] grep -P to find lines containing exactly n A's followed by exactly n B's

查看:119
本文介绍了grep -P查找正好包含n A的行,然后正好包含n B的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以编写一个 grep -P (PCRE)命令,该命令打印仅包含 A B 的行,因此正好 n A ,后跟正好 n B 和其他字符.这样这些是有效的匹配项:

Is it possible to write a grep -P (PCRE) command that prints the lines containing only A and B such that there are exactly n A's followed by exactly n B's and no other characters. Such that these are valid matches:

AB
AAABBB
AAAAAAABBBBBBB
AAAAAAAAAAAAAAAAAAAABBBBBBBBBBBBBBBBBBBB

而这些不是:

AAABB
ABBBBBB
BBBA
ABABA
BBBBBBBB

推荐答案

使用常规正则表达式,您将无法执行此操作-它们只能匹配常规无上下文语言(

With normal regular expressions, you can't do this - they can only match regular context-free languages (Type 3 in the Chomsky hierarchy of languages), while what you want to match is a classic example of a type 2 language.

幸运的是,在形式语言理论意义上, perl 正则表达式不是很规则.您可以使用递归正则表达式进行匹配:

Luckily, perl regular expressions aren't very regular in the formal language theory sense. You can match this using a recursive regular expression:

$ perl -ne 'print if /^((?>A(?1)B|))$/' input.txt
AB
AAABBB
AAAAAAABBBBBBB
AAAAAAAAAAAAAAAAAAAABBBBBBBBBBBBBBBBBBBB
$ grep -P '^((?>A(?1)B|))$' input.txt  
AB
AAABBB
AAAAAAABBBBBBB
AAAAAAAAAAAAAAAAAAAABBBBBBBBBBBBBBBBBBBB

(其中 input.txt 包含您所有的测试用例).

(Where input.txt contains all your test cases).

这将匹配一个空字符串(0 A后跟0 B)或一个以A开头的字符串,该模式与该字符串的其余部分减去第一个和最后一个字符成功递归匹配,并以B结尾.如果B在A之前出现,A在B之后出现,或者A和B的总数不匹配,则失败.(?> regex)优化a>防止比赛失败后回溯.

This matches either an empty string (0 A's followed by 0 B's), or a string starting with A, a successful recursive match of the pattern against the rest of the string minus the first and last characters, and ending with a B. If a B appears before an A, an A after a B, or the total number of A's and B's don't match, it thus fails. (?>regex) is an optimization that prevents backtracking after a match failure.

如果您想强制执行 n> = 1 ,请稍加改动以将一对A和B移到递归部分之外: ^ A((?> A(?1)B |))B $ .

If you want to enforce n >= 1, a slight variation to lift one pair of A and B outside of the recursive section: ^A((?>A(?1)B|))B$.

这篇关于grep -P查找正好包含n A的行,然后正好包含n B的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆