Ply Lex解析问题 [英] Ply Lex parsing problem

查看:105
本文介绍了Ply Lex解析问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用ply作为我的lex解析器.我的规格如下:

I'm using ply as my lex parser. My specifications are the following :

t_WHILE = r'while'  
t_THEN = r'then'  
t_ID = r'[a-zA-Z_][a-zA-Z0-9_]*'  
t_NUMBER = r'\d+'  
t_LESSEQUAL = r'<='  
t_ASSIGN = r'='  
t_ignore  = r' \t'  

当我尝试解析以下字符串时:

When i try to parse the following string :

"while n <= 0 then h = 1"

它给出以下输出:

LexToken(ID,'while',1,0)  
LexToken(ID,'n',1,6)  
LexToken(LESSEQUAL,'<=',1,8)  
LexToken(NUMBER,'0',1,11)  
LexToken(ID,'hen',1,14)      ------> PROBLEM!  
LexToken(ID,'h',1,18)  
LexToken(ASSIGN,'=',1,20)  
LexToken(NUMBER,'1',1,22)  

它不识别令牌THEN,而是将母鸡"作为标识符.

It doesn't recognize the token THEN, instead it takes "hen" as an identifier.

有什么想法吗?

推荐答案

不起作用的原因与标记优先顺序匹配标记的方式有关,最长的标记正则表达式首先被测试.

The reason that this didn't work is related to the way ply prioritises matches of tokens, the longest token regex is tested first.

防止此问题的最简单方法是将标识符和保留字匹配为相同类型,然后根据匹配结果选择适当的令牌类型.以下代码类似于 ply文档

The easiest way to prevent this problem is to match identifiers and reserved words at the same type, and select an appropriate token type based on the match. The following code is similar to an example in the ply documentation

import ply.lex

tokens = [ 'ID', 'NUMBER', 'LESSEQUAL', 'ASSIGN' ]
reserved = {
    'while' : 'WHILE',
    'then' : 'THEN'
}
tokens += reserved.values()

t_ignore    = ' \t'
t_NUMBER    = '\d+'
t_LESSEQUAL = '\<\='
t_ASSIGN    = '\='

def t_ID(t):
    r'[a-zA-Z_][a-zA-Z0-9_]*'
    if t.value in reserved:
        t.type = reserved[ t.value ]
    return t

def t_error(t):
    print 'Illegal character'
    t.lexer.skip(1)

lexer = ply.lex.lex()
lexer.input("while n <= 0 then h = 1")
while True:
    tok = lexer.token()
    if not tok:
        break
    print tok

这篇关于Ply Lex解析问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆