给定语法规则,如何生成所有可能的字符串? [英] How can I generate all possible strings given a grammar rule?

查看:132
本文介绍了给定语法规则,如何生成所有可能的字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在给定JSGF语法的情况下,我想生成它将映射到的所有终端字符串。例如,给定 A(B | C)D [E] ,我期望的输出将是:

I would like to, given a JSGF grammar, generate all the terminal strings it would map to. For example, given A (B | C) D [E], my desired output would be:

A B D E
A C D E
A B D
A C D

我决定从最简单的项目开始,即可选的支架,但很快就碰到了砖墙。它适用于1个项目,但不适用于具有替代项目的项目。任何意见,将不胜感激。

I decided to start with the easiest item, the optional brackets, but soon ran into a brick wall. It sort of works for 1 item, but not for an item with alternatives. Any advice would be appreciated.

我现在所拥有的:

import re
rule = raw_input('Enter the rule you want to test: ')
items = re.findall(r"\w[\w ]*\w|\w|\[|\]|\(|\)", rule)
for anitem in range(len(items)):
    bracketc = items[:anitem].count('[') - items[:anitem].count(']')
    if items[anitem] != '[' and items[anitem] != ']':
        if bracketc > 0:
            optional = True
        else:
            optional = False
        while optional == True:
            print ' '.join(items)
            it2 = items[:]
            it2.remove(it2[anitem])
            print ' '.join(it2)
            break

它适用于1个项目,并给定字符串AB [C] D,返回:

It works for 1 item, and given a string A B [C] D, returns:

A B [ C ] D
A B [ ] D

随着越来越复杂,所以我猜我需要完全不同的东西。

but breaks down at increasing complexity, so I am guessing I need something completely different.

推荐答案

在您的示例中,我编写了以下代码:

From your example, I've written the following piece of code :

rule="A(B|C)D[E]FG"

def generate_strings(rule):
    if not rule:
        return [""]
    begin,end=rule[0],rule[1:]
    if begin=='[':
        i=end.find(']')
        if i==-1:
            raise Exception("Unmatched '['")
        alt,end=end[0:i],end[i+1:]
        return [a+e for e in generate_strings(end) for a in [alt,""]]
    if begin=='(':
        i=end.find(')')
        if i==-1:
            raise Exception("Unmatched '('")
        alt,end=end[0:i].split('|'),end[i+1:]
        return [a+e for e in generate_strings(end) for a in alt]
    if begin in [']',')','|']:
        raise Exception("Unexpected " + begin)
    return [begin + e for e in generate_strings(end)]

print generate_strings(rule)

编辑:
这是尝试使事情与嵌套表达式一起使用。解析并非总是如此,因为解析现在变得更加微妙:当我们找到一个右括号时,它可能不是我们想要的一个,而是一个嵌套表达式的一个。

Edit : This is an attempt to make things work with nested expression. It doesn't quite work all the time as the parsing is much more delicate now : when we find a closing bracket, it might not be the one we want but the one for a nested expression. The same for pipes and parenthesis.

def flatten(l):
    return [item for sublist in l for item in sublist]

def generate_strings(rule):
    if not rule:
        return [""]
    begin,end=rule[0],rule[1:]
    if begin=='[':
        i=end.find(']')
        if i==-1:
            raise Exception("Unmatched '['")
        alt=flatten([generate_strings(a) for a in [end[0:i],""]])
        end=end[i+1:]
        return [a+e for e in generate_strings(end) for a in alt]
    if begin=='(':
        i=end.find(')')
        if i==-1:
            raise Exception("Unmatched '('")
        alt=flatten([generate_strings(a) for a in end[0:i].split('|')])
        end=end[i+1:]
        return [a+e for e in generate_strings(end) for a in alt]
    if begin in [']',')','|']:
        raise Exception("Unexpected " + begin)
    return [begin + e for e in generate_strings(end)]

print generate_strings(rule)

这篇关于给定语法规则,如何生成所有可能的字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆