生成正则表达式 [英] Generative regular expressions

查看:45
本文介绍了生成正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通常在我们的工作中,我们在 capturematch 操作中使用正则表达式.

Typically in our work we use regular expressions in capture or match operations.

但是,可以使用正则表达式 - 至少手动 - 生成与正则表达式匹配的合法句子.当然,有些正则表达式可以匹配无限长的句子,例如表达式.+.

However, regular expressions can be used - manually at least - to generate legal sentences that match the regular expression. Of course, some regular expressions can match infinitely long sentences, e.g., the expression .+.

我有一个问题,可以通过使用正则表达式语句生成算法来解决.

I have a problem that could be solved by using a regular expression sentence generating algorithm.

在伪代码中,它会这样操作:

In pseudocode, it would operate something like this:

re = generate("foo(bar|baz)?", max_match = 100);  #Don't give me more than 100 results
assert re == ("foobar", "foobaz", "foo");

什么算法会为我执行此操作?

What algorithm would perform this for me?

推荐答案

Microsoft 有一个基于 SMT 的免费(MSRL 许可)Rex"工具:http://research.microsoft.com/en-us/downloads/7f1d87be-f6d9-495d-a699-f10309/cea030/

Microsoft has a SMT-based gratis (MSRL-licensed) "Rex" tool for this: http://research.microsoft.com/en-us/downloads/7f1d87be-f6d9-495d-a699-f12599cea030/

来自Rex:符号正则表达式资源管理器"论文的介绍部分:

From the Introduction section of the "Rex: Symbolic Regular Expression Explorer" paper:

我们将(扩展的)正则表达式或正则表达式 [5] 翻译成称为 SFA 的有限自动机的符号表示.在 SFA 中,移动由代表字符集而不是单个字符的公式标记.SFA A 被翻译成一组(递归)公理,这些公理描述了 A 接受的字符串的接受条件,并建立在将字符串表示为列表的基础上.

We translate (extended) regular expressions or regexes [5] into a symbolic representation of finite automata called SFAs. In an SFA, moves are labeled by formulas representing sets of characters rather than individual characters. An SFA A is translated into a set of (recursive) axioms that describe the acceptance condition for the strings accepted by A and build on the representation of strings as lists.

由于 SMT 求解器可以在某个大小范围内输出所有可能的解决方案,这可能与您要寻找的很接近.

As the SMT solver can output all possible solutions within some size bound, this may be close to what you're looking for.

在更统计和不太正式的方面,来自 CPAN 的 Regexp::Genex 模块也可以工作:http://search.cpan.org/dist/Regexp-Genex/

On a more statistical and less formal front, the Regexp::Genex module from CPAN can work as well: http://search.cpan.org/dist/Regexp-Genex/

你可以用它来做这样的事情:

You can use it with something like this:

#!/usr/bin/env perl
use Regexp::Genex ':all';
my $hits = 100;
my $re = qr/[a-z](123|456)/;
local $Regexp::Genex::DEFAULT_LEN = length $re;
my %seen;
while ((time - $^T) < 2) {
    @seen{strings($re)} = ();
    $Regexp::Genex::DEFAULT_LEN++;
}
print "$_\n" for (sort %seen)[0..$hits-1];

根据需要调整时间和样本量.希望这会有所帮助!

Adjust the time and sample size as needed. Hope this helps!

这篇关于生成正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆