我对此XML子集有LL(1)语法吗? [英] Do I Have an LL(1) Grammar for This Subset of XML?

查看:86
本文介绍了我对此XML子集有LL(1)语法吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将使用以下EBNF语法为XML的虚构子集创建一个解析器:

I'm about to make a parser for a fictional subset of XML with the following EBNF grammar:

DOCUMENT  ::=  ELEMENT
ELEMENT   ::=  START_TAG (ELEMENT | DATA)* END_TAG | EMPTY_TAG
START_TAG ::=  < NAME ATTRIBUTE* >
END_TAG   ::=  </ NAME >
EMPTY_TAG ::=  < NAME ATTRIBUTE* />
ATTRIBUTE ::=  NAME = STRING

以上是按原样"语法,没有任何更改. 这是我尝试将其转换为LL(1):

The above is the grammar 'as is', without any changes. Here is my attempt at converting it to LL(1):

DOCUMENT         ::=    ELEMENT EOF 
ELEMENT          ::=    PREFIX > ELEMENT_OR_DATA END_TAG
                      | PREFIX />
PREFIX           ::=    < NAME OPT_ATTR 
ELEMENT_OR_DATA  ::=      OPT_ELEMENT ELEMENT_OR_DATA 
                        | OPT_DATA ELEMENT_OR_DATA 
                        | epsilon
OPT_ELEMENT      ::=    ELM_LIST | epsilon
ELM_LIST         ::=    ELEMENT  | ELEMENT ELM_LIST
OPT_DATA         ::=    DATA_LIST | epsilon
DATA_LIST        ::=    DATA | DATA DATA_LIST
END_TAG          ::=    </ NAME >
OPT_ATTR         ::=    ATTR_LIST | epsilon
ATTR_LIST        ::=    ATTRIBUTE | ATTRIBUTE ATTR_LIST
ATTRIBUTE        ::=    NAME = STRING 
EOF              ::=         &$

这是原始文件的LL(1)版本吗?如果没有,我哪里出了问题?如果是这样,有没有办法在不改变含义的情况下简化"?我不相信我拥有最简单的版本.

Is this an LL(1) version of the original? If not, where did I go wrong? And if so, is there any way to 'simplify' without changing the meaning? I'm not convinced I have the simplest possible version.

希望这很清楚.

推荐答案

LL(1)解析器仅通过查看下一个标记就无法为ELEMENT的两个规则选择正确的规则. 根据语法,解析器应尝试第一个规则: ELEMENT ::= PREFIX > ELEMENT_OR_DATA END_TAG 并且如果它不起作用,则必须从递归(回溯)中返回,以尝试第二条规则: ELEMENT ::= PREFIX />

LL(1) parser cannot pick correct rule for ELEMENT's two rules just by looking at the next token. According to the grammar, the parser should try the first rule: ELEMENT ::= PREFIX > ELEMENT_OR_DATA END_TAG And if it does not work, it must return from the recursion (backtracking) in order to try the second rule: ELEMENT ::= PREFIX />

问题是两个规则都从相同的对象"前缀开始. 在这种情况下,它甚至更糟,因为它不是终端.

The problem is that both rules start from the same "object" PREFIX. In this case, it is even "worse" because, it is not terminal.

当然,这不是LL(1)语法.让我们尝试构建一个.

Definitely, this is not LL(1) grammar. Let's try to build one.

我们首先通过删除TAG来简化原始语法: DOCUMENT ::= ELEMENT ELEMENT ::= < NAME ATTRIBUTE* > (ELEMENT | DATA)* </ NAME > ELEMENT ::= < NAME ATTRIBUTE* /> ATTRIBUTE ::= NAME = STRING

We first simplify the original grammar, by removing TAG's: DOCUMENT ::= ELEMENT ELEMENT ::= < NAME ATTRIBUTE* > (ELEMENT | DATA)* </ NAME > ELEMENT ::= < NAME ATTRIBUTE* /> ATTRIBUTE ::= NAME = STRING

下一步是为ELEMENT拆分规则以获取第一个令牌,这将有助于解析器选择正确的规则. DOCUMENT ::= ELEMENT ELEMENT ::= < NAME ATTRIBUTE* ELEMENT1 ELEMENT1 ::= > (ELEMENT | DATA)* </ NAME > ELEMENT1 ::= /> ATTRIBUTE ::= NAME = STRING

The next step is to split rules for ELEMENT in order to get the first token, which will help to the parser to select correct rule. DOCUMENT ::= ELEMENT ELEMENT ::= < NAME ATTRIBUTE* ELEMENT1 ELEMENT1 ::= > (ELEMENT | DATA)* </ NAME > ELEMENT1 ::= /> ATTRIBUTE ::= NAME = STRING

现在解析器可以成功开始解析元素.它推迟"是扩展元素还是短元素的决定,并将此问题委托给ELEMENT1规则.后者可以通过检查下一个标记是>还是/>来确定要解析的元素类型.

Now parser can successfully start parsing an element. It "postpones" the decision whether it is extended or short element, and delegates this question to the ELEMENT1-rules. The later one can determine which type of element is being parsed by checking whether the next token is > or />.

让我们继续进行转换: DOCUMENT ::= ELEMENT ELEMENT ::= < NAME ATTRIBUTES ELEMENT1 ELEMENT1 ::= > ELEMENT_OR_DATA </ NAME > ELEMENT1 ::= /> ELEM_OR_DATA ::= ELEMENT ELEM_OR_DATA ELEM_OR_DATA ::= DATA ELEM_OR_DATA ELEM_OR_DATA ::= epsilon ATTRIBUTES ::= NAME = STRING ATTRIBUTES ATTRIBUTES ::= epsilon

Let's continue the transformation: DOCUMENT ::= ELEMENT ELEMENT ::= < NAME ATTRIBUTES ELEMENT1 ELEMENT1 ::= > ELEMENT_OR_DATA </ NAME > ELEMENT1 ::= /> ELEM_OR_DATA ::= ELEMENT ELEM_OR_DATA ELEM_OR_DATA ::= DATA ELEM_OR_DATA ELEM_OR_DATA ::= epsilon ATTRIBUTES ::= NAME = STRING ATTRIBUTES ATTRIBUTES ::= epsilon

我们只是将* -operator替换为正确的LL语法. 最后一个语法仍然存在一些问题:前两个ELEM_OR_DATA规则可能会混淆"解析器,因为它无法猜测要应用哪个解析器(与我们一开始所讨论的问题类似).

We just replaced *-operator with proper LL syntax. This last grammar still has some issue: the first two ELEM_OR_DATA rules may "confuse" the parser, because it cannot guess which one to apply (similar problem to one we discussed at the beginning).

让我们通过解析器提示来解决此问题: DOCUMENT ::= ELEMENT EOF ELEMENT ::= < ELEMENT0 ELEMENT0 ::= NAME ATTRIBUTES ELEMENT1 ELEMENT1 ::= > ELEMENT_OR_DATA </ NAME > ELEMENT1 ::= /> ELEM_OR_DATA ::= < ELEMENT0 ELEM_OR_DATA ELEM_OR_DATA ::= DATA ELEM_OR_DATA ELEM_OR_DATA ::= epsilon ATTRIBUTES ::= NAME = STRING ATTRIBUTES ATTRIBUTES ::= epsilon

Let's solve this issue by giving a hint to the parser: DOCUMENT ::= ELEMENT EOF ELEMENT ::= < ELEMENT0 ELEMENT0 ::= NAME ATTRIBUTES ELEMENT1 ELEMENT1 ::= > ELEMENT_OR_DATA </ NAME > ELEMENT1 ::= /> ELEM_OR_DATA ::= < ELEMENT0 ELEM_OR_DATA ELEM_OR_DATA ::= DATA ELEM_OR_DATA ELEM_OR_DATA ::= epsilon ATTRIBUTES ::= NAME = STRING ATTRIBUTES ATTRIBUTES ::= epsilon

在第一个ELEM_OR_DATA规则中,我们拆分了ELEMENT1并使用了ELEMENT0.现在假设DATA是一个令牌,解析器只需查看下一个令牌就可以轻松确定要应用的规则.

We splitted ELEMENT1 and used ELEMENT0 in the first ELEM_OR_DATA rule. Now assuming DATA is a token, parser can easily determine which rule to apply by looking at the next token only.

这篇关于我对此XML子集有LL(1)语法吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆