类似于正则表达式的语法或CFG,用于生成级联字符串变量和文字的笛卡尔积 [英] Regex-like syntax or CFG for generating cartesian product of concatenated string variables and literals

查看:145
本文介绍了类似于正则表达式的语法或CFG,用于生成级联字符串变量和文字的笛卡尔积的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个模拟器,并希望通过使用不同的命令行参数集调用许多模拟器实例来运行研究。我已经阅读了这个问题和其他几个问题,它们似乎很接近,但是我实际上,我不是在寻找满足特定正则表达式的随机数据,我想要与正则表达式匹配的所有 all 字符串集。输入文件示例如下所示:

I am writing a simulator, and would like to run studies by invoking a lot of instances of the simulator, using different sets of command-line arguments. I have read this question and several others, and they seem close, but I'm actually not looking for random data fulfilling a particular regex, I would like the set of all strings that match the regex. An example input file would look something like this:

myprogram.{version1|version2} -arg1 {1|2|4} {-arg2|}

或:

myprogram.{0} -arg1 {1} {2}
0: "version1" "version2"
1: "1" "2" "4"
2: "-arg2" ""

并会产生:

myprogram.version1 -arg1 1 -arg2
myprogram.version1 -arg1 1
myprogram.version1 -arg1 2 -arg2
myprogram.version1 -arg1 2
myprogram.version1 -arg1 4 -arg2
myprogram.version1 -arg1 4
myprogram.version2 -arg1 1 -arg2
myprogram.version2 -arg1 1
myprogram.version2 -arg1 2 -arg2
myprogram.version2 -arg1 2
myprogram.version2 -arg1 4 -arg2
myprogram.version2 -arg1 4

我想像这样的东西已经存在,我只是不知道要搜索的正确术语。任何帮助将非常感激。如果需要,我可以自己实现一种抽象技术或算法,但是如果它是一个预先存在的工具,我希望它是免费的(至少像在啤酒中一样)并可以在Linux上运行。

I would imagine something like this already exists, I just don't know the correct term to search for. Any help would be much appreciated. I can implement an abstract technique or algorithm myself if need be, but if it's a pre-existing tool I would prefer it to be free (at least as in beer) and run on Linux.

我知道我可能会遗漏一些细节,并且在必要时可以更详细地说明适当的事情,而不是先淹没很多细节的人。我很可能会以错误的方式走这条路,我欢迎所有解决方案,即使它们以不同的方式解决了我的问题。

I know I am probably leaving some details out, and can be more specific about the appropriate things if necessary, rather than inundate people with a lot of detail up front. It is entirely possible that I am going about this the wrong way, and I am welcome to all solutions, even if they solve my problem in a different way.

最重要的是,如果要向生成的字符串的叉积添加更多参数选项,则此解决方案不需要我编写任何额外的解析代码。我已经有一个Perl脚本,通过对每个变量进行嵌套嵌套的 for 循环来完成此操作,每次更改变量的数量或性质时,这些变量都必须更改。

Most importantly, this solution should not require me to write any extra parsing code if I want to add more argument options to the "cross-product" of strings I generate. I already have a Perl script that does this with a set of nested for loops over each "variable" that must change every time I change the number or nature of variables.

推荐答案

只要不嵌套括号,正则表达式就可以正常工作。如果需要嵌套,则可以在实现语言中添加一些额外的递归。

As long as the braces are not nested, regular expressions will work fine. If you require nesting, you could add some extra recursion in the implementation language.

以下是Python中的示例:

Here is an example in Python:

import re

def make_choices(template):
    pat = re.compile(r'(.*?)\{([^{}]+)\}',re.S)

    # tokenize the string
    last_end = 0
    choices = []
    for match in pat.finditer(template):
        prefix, alts = match.groups()
        if prefix:
            choices.append((prefix,)) # as a tuple
        choices.append(alts.split("|"))
        last_end = match.end()

    suffix = template[last_end:]
    if suffix:
        choices.append((suffix,))

    # recursive inner function
    def chooser(index):
        if index >= len(choices):
            yield []
        else:
            for alt in choices[index]:
                for result in chooser(index+1):
                    result.insert(0,alt)
                    yield result

    for result in chooser(0):
        yield ''.join(result)

示例:

>>> for result in make_choices('myprogram.{version1|version2} -arg1 {1|2|4} {-arg2|}'):
...     print result
...
myprogram.version1 -arg1 1 -arg2
myprogram.version1 -arg1 1
myprogram.version1 -arg1 2 -arg2
myprogram.version1 -arg1 2
myprogram.version1 -arg1 4 -arg2
myprogram.version1 -arg1 4
myprogram.version2 -arg1 1 -arg2
myprogram.version2 -arg1 1
myprogram.version2 -arg1 2 -arg2
myprogram.version2 -arg1 2
myprogram.version2 -arg1 4 -arg2
myprogram.version2 -arg1 4

您可以使用 os.system()从Python内部执行命令:

You could use os.system() to execute the commands from within Python:

#!/etc/env python
import sys, os

template = ' '.join(sys.args)
failed = 0
total = 0
for command in make_choices(template):
    print command
    if os.system(command):
        print 'FAILED'
        failed += 1
    else:
        print 'OK'
    total += 1

print
print '%d of %d failed.' % (failed,total)

sys.exit(failed > 0)

然后在命令行上:

user:/home/> template.py 'program.{version1|version2}'
program.version1
OK
program.version2
FAILED

1 of 2 failed.

这篇关于类似于正则表达式的语法或CFG,用于生成级联字符串变量和文字的笛卡尔积的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆