BNF可以处理远期消费吗? [英] Can a BNF handle forward consumption?

查看:123
本文介绍了BNF可以处理远期消费吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近,我发现了python模块 pyparsing ,这是通过编写语法而不是解析器来解析数据的绝佳工具。我是无上下文语法的新手,因此请更正此问题中的所有错误假设。

Recently I've discovered the python module pyparsing, a wonderful tool for parsing data by writing the grammar, not the parser. I'm new to the idea of context-free grammars, so please correct any false assumptions in this question.

Pyparsing可以实现BNF( Backus–Naur形式)的上下文无关文法。该语法可以递归,但是可以向前看吗?自从偶然发现这个问题以来,我一直在想这个问题的答案 。让我给你一个具体的例子。考虑以下字符串:

Pyparsing can implement a BNF (Backus–Naur Form) context-free grammar. This grammar can be recursive, but can it have a forward look-ahead? I've been wondering about the answer to this since I stumbled across this question. Let me give you a concrete example. Consider the string:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

让语法看起来像这样:

<number> :: __<digit>__
<block>  :: <number>(x) (<number> <number> <number> .... x times)

ie读取第一个数字令牌,将其另存为 x ,然后使用下一个 x 编号并将它们分组在一起。解析后的行应如下所示:

i.e. read the first number token, save it as x and then consume the next x numbers and group them together. The parsed line should look like:

 [[1, 2], [3, 4, 5, 6], [7, 8, 9, 10, 11, 12, 13, 14], [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]]

我用pyparsing编写了一个简单的python MWE not ,所以很明显我想做的事情:

I've written a simple python MWE not using pyparsing so it's clear what I'm trying to do:

A = range(1,31)

B, sub_b = [], []
consume  = 0

for a in A:
    if consume: 
        sub_b.append(a)
        consume -= 1
    else:
        if sub_b: B.append(sub_b)
        sub_b = [a,]
        consume = a
B.append(sub_b)
print B

两个(相关)问题:可以使用BNF上下文无关的语法来完成?如果是/否,我该如何使用 pyparsing

Two (related) questions: Can this be done with a BNF context-free grammar? If yes/no how can I do it with pyparsing?

推荐答案

在这种情况下,没有上下文无关的语法或常规语法中的参数化大小是没有问题的。像 consum 这样的数字参数不是CFG模型的一部分,可以证明以不同的方式获得效果是不可能的。但是,您可以为任何 fixed 长度编写生产,因此,您可以为长度为1,长度2,长度3等的块编写生产。

There is no such thing as a parametrized size in a context-free grammar-- or in a regular grammar, for that matter. Numerical parameters like your consume are not part of the CFG model, and it can be proved that getting the effect in a different way is impossible. You can write a production for any fixed length, though, so you can write productions for a block of length 1, length 2, length 3, etc.:

<block3> :: <number> <number> <number>

或类似地,将长度匹配为前缀或什至作为后缀:

or, similarly, matching the length as a prefix or even as a post-fix:

<block3a> :: 3 <number> <number> <number>
<block3b> :: <number> <number> <number> 3

等。因此,要做您想做的事情,您只需要一个包含此类规则的语法,就可能需要使用所有 N 规则。

etc. So to do what you want, you just need a grammar containing rules of the kind, for all N you might need.

给定的CFG仅包含有限数量的此类产品。从数学上讲,不可能编写可以处理无限制参数化大小的CFG(以BNF或任何其他形式),无论是作为前缀还是作为后缀。在实践中,您可能可以根据需要动态更新CFG。例如,读取数字 N 并为您的语法创建规则 blockN (如果尚不存在)。但是,没有单个 CFG可以捕获无限制的参数化大小。

A given CFG will only include a finite number of these productions. It is mathematically impossible to write a CFG (in BNF or any other form) that can handle unlimited parametrized sizes, whether as a prefix or as a postfix. In practice, you might be able to update your CFG on the fly with new productions as needed. E.g., read a number N and create a rule blockN for your grammar if it's not already present. But there's no single CFG that can capture unbounded parametrized sizes.

编辑,因为您还询问过上下文-敏感的语法:仍然不会这样做。问题在于使用整数算术,而不是语法的类别。 Chomsky层次结构上的 Any 形式语言是根据有限数量的符号(令牌)定义的,并且由于存在无限多个整数,因此无法为它们分配不同的含义(请注意,您的解析过程依赖于整数算术)。

Edit, since you also asked about context-sensitive grammars: That still won't do it. The problem is the use of integer arithmetic, not the class of the grammar. Any formal language on the Chomsky hierarchy is defined in terms of a finite number of symbols (tokens), and since are infinitely many integers, they cannot be assigned distinct meanings (note that your parsing procedure relies on integer arithmetic).

如果您要将该长度预处理为多颗恒星的序列( * * * 4 7 10 ),CFG解析器就很简单了:

If you were to pre-process the length into a sequence of as many stars (* * * 4 7 10), the CFG parser is trivial:

<group> :: * <group> <number>
<group> :: * <number>

这就是所谓的 a ^ nb ^ n 语言。您也可能有表示十的符号,等等。但是,如果不进行预处理,唯一的解决方案(以及您的过程或图灵机在实践中所做的事情)是在语法中解释数字符号。例如,将 21解析为十点十一。我怀疑是否可以在CFG中完成此操作(问题是处理任意长数字,而没有针对数百万,数十亿等的单独规则),但我不确定。无论哪种方式,它都只是作为一种学术练习而有趣,因为使用实数非常容易。我敢肯定人们已经研究了带有整数的形式语言的属性,但是我对此无能为力。

This is just the so-called a^n b^n language. You could also have symbols that mean "ten", etc. But without pre-processing, the only solution (and what your procedure or a Turing machine does in practice) is to interpret numeral notation in your grammar. E.g., parse "21" as ten ten one. I doubt this can be done in a CFG (the problem is handling arbitrarily long numerals without separate rules for millions, billions etc.), but I'm not really sure. Either way it is only interesting as an academic exercise, since using real integers is so much easier. I'm sure people have studied the properties of formal languages with integers, but I can't say anything about that.

这篇关于BNF可以处理远期消费吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆