从通用字符串中区分标识符 [英] Distinguishing identifiers from common strings
问题描述
我想使用Bison/Yacc
+ Lex
编写一个解析器,它可以解析如下语句:
I want to write a parser using Bison/Yacc
+ Lex
which can parse statements like:
VARIABLE_ID = 'STRING'
其中:
ID [a-zA-Z_][a-zA-Z0-9_]*
和:
STRING [a-zA-Z0-9_]+
因此,var1 = '123abc'
是有效的语句,而1var = '123abc'
则不是.
So, var1 = '123abc'
is a valid statement while 1var = '123abc'
isn't.
因此,VARIABLE_ID
是STRING
,但STRING
并不总是是VARIABLE_ID
.
Therefore, a VARIABLE_ID
is a STRING
but a STRING
not always is a VARIABLE_ID
.
我想知道的是,区分两者的唯一方法是编写更高级别的检查程序(例如,在Bison
代码内部)还是我可以在Lex
代码中实现它.
What I would like to know is if the only way to distinguish between the two is writing a checking procedure at a higher level (i.e. inside Bison
code) or if I can work it out in the Lex
code.
推荐答案
您的抽象语句语法实际上是:
Your abstract statement syntax is actually:
VARIABLE = STRING
不是
VARIABLE = 'STRING'
因为引号分隔符是我们通常希望避免使用的语法细节.因此,令牌模式实际上是这样的:
because the quote delimiters are a lexical detail that we generally want to keep out of the syntax. And so, the token patterns are actually this:
ID [a-zA-Z_][a-zA-Z0-9_]*
STRING '[a-zA-Z_0-9]*'
ID
是字母或下划线,后跟字母,数字和下划线的任意组合(包括空).
An ID
is a letter or underscore, followed by any combination (including empty) of letters, digits and underscores.
STRING
是一个单引号,然后是一个序列(可能为空)字母,数字和下划线,然后是另一个单引号.
A STRING
is a single quote, followed by a sequence (possibly empty) letters, digits and underscores, followed by another single quote.
因此,您担心的歧义不存在.实际上,ID
并不是STRING
,反之亦然.
So the ambiguity you are concerned about does not exist. An ID
is not in fact a STRING
, nor vice versa.
在Bison解析器内部或词法分析器中的某个位置,您可能希望对STRING
匹配项的yytext
进行修饰以删除引号,而只是将它们之间的文本保留为字符串.这可能是Bison规则,可能类似于:
Somewhere inside your Bison parser, or possibly in the lexer, you might want to massage the yytext
of a STRING
match to remove the quotes and just retain the text in between them as a string. This could be a Bison rule, possibly similar to:
string : STRING
{
$$ = strip_quotes($1);
}
;
这篇关于从通用字符串中区分标识符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!