从通用字符串中区分标识符 [英] Distinguishing identifiers from common strings

查看:115
本文介绍了从通用字符串中区分标识符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用Bison/Yacc + Lex编写一个解析器,它可以解析如下语句:

I want to write a parser using Bison/Yacc + Lex which can parse statements like:

VARIABLE_ID = 'STRING' 

其中:

ID       [a-zA-Z_][a-zA-Z0-9_]*

和:

STRING      [a-zA-Z0-9_]+

因此,var1 = '123abc'是有效的语句,而1var = '123abc'则不是.

So, var1 = '123abc' is a valid statement while 1var = '123abc' isn't.

因此,VARIABLE_IDSTRING,但STRING并不总是是VARIABLE_ID.

Therefore, a VARIABLE_ID is a STRING but a STRING not always is a VARIABLE_ID.

我想知道的是,区分两者的唯一方法是编写更高级别的检查程序(例如,在Bison代码内部)还是我可以在Lex代码中实现它.

What I would like to know is if the only way to distinguish between the two is writing a checking procedure at a higher level (i.e. inside Bison code) or if I can work it out in the Lex code.

推荐答案

您的抽象语句语法实际上是:

Your abstract statement syntax is actually:

VARIABLE = STRING

不是

VARIABLE = 'STRING'

因为引号分隔符是我们通常希望避免使用的语法细节.因此,令牌模式实际上是这样的:

because the quote delimiters are a lexical detail that we generally want to keep out of the syntax. And so, the token patterns are actually this:

ID       [a-zA-Z_][a-zA-Z0-9_]*
STRING   '[a-zA-Z_0-9]*'

ID是字母或下划线,后跟字母,数字和下划线的任意组合(包括空).

An ID is a letter or underscore, followed by any combination (including empty) of letters, digits and underscores.

STRING是一个单引号,然后是一个序列(可能为空)字母,数字和下划线,然后是另一个单引号.

A STRING is a single quote, followed by a sequence (possibly empty) letters, digits and underscores, followed by another single quote.

因此,您担心的歧义不存在.实际上,ID并不是STRING,反之亦然.

So the ambiguity you are concerned about does not exist. An ID is not in fact a STRING, nor vice versa.

在Bison解析器内部或词法分析器中的某个位置,您可能希望对STRING匹配项的yytext进行修饰以删除引号,而只是将它们之间的文本保留为字符串.这可能是Bison规则,可能类似于:

Somewhere inside your Bison parser, or possibly in the lexer, you might want to massage the yytext of a STRING match to remove the quotes and just retain the text in between them as a string. This could be a Bison rule, possibly similar to:

string : STRING 
       {
          $$ = strip_quotes($1);
       }
       ;

这篇关于从通用字符串中区分标识符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆