给定语法规则,如何生成所有可能的字符串? [英] How can I generate all possible strings given a grammar rule?
问题描述
在给定JSGF语法的情况下,我想生成它将映射到的所有终端字符串。例如,给定 A(B | C)D [E]
,我期望的输出将是:
I would like to, given a JSGF grammar, generate all the terminal strings it would map to. For example, given A (B | C) D [E]
, my desired output would be:
A B D E
A C D E
A B D
A C D
我决定从最简单的项目开始,即可选的支架,但很快就碰到了砖墙。它适用于1个项目,但不适用于具有替代项目的项目。任何意见,将不胜感激。
I decided to start with the easiest item, the optional brackets, but soon ran into a brick wall. It sort of works for 1 item, but not for an item with alternatives. Any advice would be appreciated.
我现在所拥有的:
import re
rule = raw_input('Enter the rule you want to test: ')
items = re.findall(r"\w[\w ]*\w|\w|\[|\]|\(|\)", rule)
for anitem in range(len(items)):
bracketc = items[:anitem].count('[') - items[:anitem].count(']')
if items[anitem] != '[' and items[anitem] != ']':
if bracketc > 0:
optional = True
else:
optional = False
while optional == True:
print ' '.join(items)
it2 = items[:]
it2.remove(it2[anitem])
print ' '.join(it2)
break
它适用于1个项目,并给定字符串AB [C] D,返回:
It works for 1 item, and given a string A B [C] D, returns:
A B [ C ] D
A B [ ] D
随着越来越复杂,所以我猜我需要完全不同的东西。
but breaks down at increasing complexity, so I am guessing I need something completely different.
推荐答案
在您的示例中,我编写了以下代码:
From your example, I've written the following piece of code :
rule="A(B|C)D[E]FG"
def generate_strings(rule):
if not rule:
return [""]
begin,end=rule[0],rule[1:]
if begin=='[':
i=end.find(']')
if i==-1:
raise Exception("Unmatched '['")
alt,end=end[0:i],end[i+1:]
return [a+e for e in generate_strings(end) for a in [alt,""]]
if begin=='(':
i=end.find(')')
if i==-1:
raise Exception("Unmatched '('")
alt,end=end[0:i].split('|'),end[i+1:]
return [a+e for e in generate_strings(end) for a in alt]
if begin in [']',')','|']:
raise Exception("Unexpected " + begin)
return [begin + e for e in generate_strings(end)]
print generate_strings(rule)
编辑:
这是尝试使事情与嵌套表达式一起使用。解析并非总是如此,因为解析现在变得更加微妙:当我们找到一个右括号时,它可能不是我们想要的一个,而是一个嵌套表达式的一个。
Edit : This is an attempt to make things work with nested expression. It doesn't quite work all the time as the parsing is much more delicate now : when we find a closing bracket, it might not be the one we want but the one for a nested expression. The same for pipes and parenthesis.
def flatten(l):
return [item for sublist in l for item in sublist]
def generate_strings(rule):
if not rule:
return [""]
begin,end=rule[0],rule[1:]
if begin=='[':
i=end.find(']')
if i==-1:
raise Exception("Unmatched '['")
alt=flatten([generate_strings(a) for a in [end[0:i],""]])
end=end[i+1:]
return [a+e for e in generate_strings(end) for a in alt]
if begin=='(':
i=end.find(')')
if i==-1:
raise Exception("Unmatched '('")
alt=flatten([generate_strings(a) for a in end[0:i].split('|')])
end=end[i+1:]
return [a+e for e in generate_strings(end) for a in alt]
if begin in [']',')','|']:
raise Exception("Unexpected " + begin)
return [begin + e for e in generate_strings(end)]
print generate_strings(rule)
这篇关于给定语法规则,如何生成所有可能的字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!