生成文本的所有排列从C#中的正则表达式模式 [英] Generate all Permutations of text from a regex pattern in C#

查看:211
本文介绍了生成文本的所有排列从C#中的正则表达式模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有一个正则表达式,我想生成所有这将是从该模式允许的文字排列。

例如:

  VAR模式=^我的(?:生物|真人)?名称是史蒂夫$;
VAR排列= getStringPermutations(图案);
 

这将返回下面的字符串列表:

我的名字叫史蒂夫

我的真名是史蒂夫

我亲生的名字是史蒂夫

  

更新:   显然,一个正则表达式匹配有一个infinate数,所以我只是想产生过可选的字符串作为(?:生物|真人)?从上面我的例子。喜欢的东西(。)*有太多的比赛,所以我不会生成它们掉了。

解决方案

如果你限制自己的被固定在两端定期EX pressions的子集,只包括文字文本,单字符通配符和交替,匹配 字符串应该是pretty的易枚举。我可能会重写正则表达式作为一个BNF语法 并用它来生成字符串匹配的详尽清单。对于您的例子:

 <朗GT&; - > <开始> <中> <末>
<开始> - > 我的
<中> - > | 真实| 生物
<末> - > 名字叫史蒂夫
 

开始与只有终结符在RHS的生产和枚举 所有可能的值在LHS非终结符可以采取。然后工作您 一直到与在RHS非终结符的制作。对于非终结符号的串联,形成集的笛卡尔乘积再由各RHS非终结psented $ P $。 对于轮换,采取联合的集合再由每个选项psented $ P $。继续 直到你过你的方式到<朗GT&; ,然后就大功告成了。

然而,一旦你包括*或+操作​​符,你不得不与无限 字符串匹配的数字。如果你也想处理先进的功能,如反向引用...你可能对你的方式来的东西,是同构 到停机问题!

So i have a regex pattern, and I want to generate all the text permutations that would be allowed from that pattern.

Example:

var pattern = "^My (?:biological|real)? Name is Steve$";
var permutations = getStringPermutations(pattern);

This would return the list of strings below:

My Name is Steve

My real Name is Steve

My biological Name is Steve

Update: Obviously a regex has an infinate number of matches, so i only want to generate off of optional string literals as in the (?:biological|real)? from my example above. Something like (.)* has too many matches, so I will not be generating them off of that.

解决方案

If you restrict yourself to the subset of regular expressions that are anchored at both ends, and involve only literal text, single-character wildcards, and alternation, the matching strings should be pretty easy to enumerate. I'd probably rewrite the regex as a BNF grammar and use that to generate an exhaustive list of matching strings. For your example:

<lang>   -> <begin> <middle> <end>
<begin>  -> "My "
<middle> -> "" | "real" | "biological"
<end>    -> " name is Steve"

Start with the productions that have only terminal symbols on the RHS, and enumerate all the possible values that the nonterminal on the LHS could take. Then work your way up to the productions with nonterminals on the RHS. For concatenation of nonterminal symbols, form the Cartesian product of the sets represented by each RHS nonterminal. For alternation, take the union of the sets represented by each option. Continue until you've worked your way up to <lang>, then you're done.

However, once you include the '*' or '+' operators, you have to contend with infinite numbers of matching strings. And if you also want to handle advanced features like backreferences...you're probably well on your way to something that's isomorphic to the Halting Problem!

这篇关于生成文本的所有排列从C#中的正则表达式模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆