如何一次一次灵活地返回多个终端 [英] How can flex return multiple terminals at one time

查看:92
本文介绍了如何一次一次灵活地返回多个终端的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了使我的问题更容易理解,我想使用以下示例:

In order to make my question easy to understand I want to use the following example:

以下代码在fortran语言中称为非阻塞循环

The following code is called nonblock do-loop in fortran language

DO 20 I=1, N      ! line 1
DO 20 J=1, N      ! line 2
    ! more codes
20  CONTINUE      ! line 4

请注意,第4行的标签 20 表示内部do-loop和外部do-loop的结尾.

Pay attention that the label 20 at line 4 means the end of both the inner do-loop and the outer do-loop.

我希望我的flex程序正确解析该功能:当flex读取标签 20 时,它将两次返回 ENDDO 终端.

I want my flex program to parse the feature correctly: when flex reads the label 20, it will return ENDDO terminal twice.

首先,由于我也使用了bison,因此每次bison调用 yylex()来获得一个终端.如果我可以要求野牛在某些情况下从 yylex()获取终端,而在其他情况下可以从另一个函数获取终端,也许我可以解决此问题,但是,到那时我还是不知道.

Firstly, because I also use bison, so every time bison calls yylex() to get one terminal. If I can ask bison to get terminals from yylex() in some cases, and from another function in other cases, maybe I could solve this problem, however, I got no idea here then.

当然,有一些解决方法,例如,我可以使用flex的启动条件,但我认为这不是一个好的解决方案.所以我问如果没有解决方法,是否有办法解决我的问题?

Of course there are some workarounds, for eample, I can use flex's start condition but I don't think it is a good solution. So I ask if there's any way to solve my question without a workaround?

推荐答案

很容易修改(f)lex生成的词法扫描器以实现令牌队列,但这不一定是最佳解决方案.(请参见下文,以获得更好的解决方案.)(此外,对于您的特定问题,我真的不清楚,在词法分析器中构造额外的令牌是否确实合适.)

It is easy enough to modify the lexical scanner produced by (f)lex to implement a token queue, but that is not necessarily the optimal solution. (See below for a better solution.) (Also, it is really not clear to me that for your particular problem, fabricating the extra token in the lexer is truly appropriate.)

通常的方法是将代码插入 yylex 函数的顶部,您可以通过将代码紧接在 %% 行之后和第一行之前来实现.规则.(必须对代码进行缩进,以便不能将其解释为规则.)对于非可重入扫描程序,通常将涉及使用本地 static 变量来保存队列.对于一个简单但愚蠢的示例,使用C API但使用C ++进行编译,以便可以访问C ++标准库:

The general approach is to insert code at the top of the yylex function, which you can do by placing the code immediately after the %% line and before the first rule. (The code must be indented so that it is not interpreted as a rule.) For non-reentrant scanners, this will typically involve the use of a local static variable to hold the queue. For a simple but dumb example, using the C API but compiling with C++ so as to have access to the C++ standard library:

%%
  /* This code will be executed each time `yylex` is called, before
   * any generated code. It may include declarations, even if compiled
   * with C89.
   */
  static std::deque<int> tokenq;
  if (!tokenq.empty()) {
    int token = tokenq.front();
    tokenq.pop_front();
    return token;
  }
[[:digit:]]+  { /* match a number and return that many HELLO tokens */
                   int n = atoi(yytext);
                   for (int i = 0; i < n; ++i)
                     tokenq.push_back(HELLO);
              }

上面的代码没有尝试为排队的令牌提供语义值;您可以使用类似 std :: queue< std :: pair< int,YYSTYPE>> 的令牌队列来实现此目的,但是通常 YYSTYPE union 会带来一些麻烦.同样,如果那是使用令牌队列的唯一原因,则很明显,可以用一个简单的计数器代替它,这样会更高效.例如,请参见此答案,其含义与您的问题模棱两可(并请注意注释1中的建议).该答案).

The above code makes no attempt to provide a semantic value for the queued tokens; you could achieve that using something like a std::queue<std::pair<int, YYSTYPE>> for the token queue, but the fact that YYSTYPE is typically a union will make for some complications. Also, if that were the only reason to use the token queue, it is obvious that it could be replaced with a simple counter, which would be much more efficient. See, for example, this answer which does something vaguely similar to your question (and take note of the suggestions in Note 1 of that answer).

尽管令牌队列解决方案既有吸引力又简单,但它很少是最佳解决方案.在大多数情况下,如果您要求野牛产生.使用推式解析器,每当令牌可用时,词法分析器就会调用解析器.这使得从词法分析器操作返回多个令牌变得很简单;您只需为每个令牌调用解析器.同样,如果规则不产生任何令牌,则它根本无法调用解析器.在此模型中,实际上 return s的唯一词法分析器操作是<< EOF>> 规则,并且只有在使用 END 令牌以指示解析已完成.

Although the token queue solution is attractive and simple, it is rarely the best solution. In most cases, code will be clearer and easier to write if you request bison to produce a "push parser". With a push parser, the parser is called by the lexer every time a token is available. This makes it trivial to return multiple tokens from a lexer action; you just call the parser for each token. Similarly, if a rule doesn't produce any tokens, it simply fails to call the parser. In this model, the only lexer action which actually returns is the <<EOF>> rule, and it only does so after calling the parser with the END token to indicate that parsing is complete.

不幸的是,推送解析器的界面不仅会更改,正如该手动链接所指示的那样;它也非常糟糕地被记录在案.因此,这是一个简单但完整的示例,显示了如何完成此操作.

Unfortunately, the interface for push parsers is not only subject to change, as that manual link indicates; it is also very badly documented. So here is a simple but complete example which shows how it is done.

推式解析器将其状态保持在 yypstate 结构中,该结构需要在每次调用时传递给解析器.由于该词法分析器仅对每个输入文件调用一次,因此该词法分析器拥有该结构是合理的,可以使用局部静态变量按上述方法完成[注1]:在 yylex <时初始化解析器状态./code>被调用,并且 EOF 规则删除解析器状态以便回收正在使用的任何内存.

The push parser keeps its state in a yypstate structure, which needs to be passed to the parser on each call. Since the lexer is called only once for each input file, it is reasonable for the lexer to own that structure, which can be done as above with a local static variable [Note 1]: the parser state is initialized when yylex is called, and the EOF rule deletes the parser state in order to reclaim whatever memory it is using.

通常,构建可重入的推送解析器最方便,这意味着解析器不依赖于全局 yylval 变量[注2].而是必须提供指向语义值的指针作为 yypush_parse 的附加参数.如果您的解析器没有引用特定令牌类型的语义值,则可以为此参数提供NULL.或者,如下面的代码所示,您可以在词法分析器中使用局部语义值变量.不必每次对推式解析器的调用都提供相同的指针.总之,对扫描仪定义的更改很小:

It is usually most convenient to build a reentrant push parser, which means that the parser does not rely on the global yylval variable [Note 2]. Instead, a pointer to the semantic value must be provided as an additional argument to yypush_parse. If your parser doesn't refer to the semantic value for the particular token type, you can provide NULL for this argument. Or, as in the code below, you can use a local semantic value variable in the lexer. It is not necessary that every call to the push parser provide the same pointer. In all, the changes to the scanner definition are minimal:

%%
  /* Initialize a parser state object */
  yypstate* pstate = yypstate_new();
  /* A semantic value which can be sent to the parser on each call */
  YYSTYPE yylval;
  /* Some example scanner actions */
"keyword"    {  /* Simple keyword which just sends a value-less token */
                yypush_parse(pstate, TK_KEYWORD, NULL); /* See Note 3 */
             }
[[:digit:]]+ { /* Token with a semantic value */
               yylval.num = atoi(yytext);
               yypush_parse(pstate, TK_NUMBER, &yylval);
             }
"dice-roll"  { /* sends three random numbers */
               for (int i = 0; i < 2; ++i) {
                 yylval.num = rand() % 6;
                 yypush_parse(pstate, TK_NUMBER, &yylval);
             }
<<EOF>>      { /* Obligatory EOF rule */
               /* Send the parser the end token (0) */
               int status = yypush_parse(pstate, 0, NULL);
               /* Free the pstate */
               yypstate_delete(pstate);
               /* return the parser status; 0 is success */
               return status;
             }

在解析器中,除了添加必要的声明外,不需要做太多更改:[注4]

In the parser, not much needs to be changed at all, other than adding the necessary declarations: [Note 4]

%define api.pure full
%define api.push-pull push


注释

  1. 如果您还要构建可重入词法分析器,则应使用词法分析器状态对象的多余数据部分,而不要使用静态变量.

  1. If you were building a reentrant lexer as well, you would use the extra data section of the lexer state object instead of static variables.

如果您在解析器中使用位置对象来跟踪源代码位置,则这也适用于 yylloc .

If you are using location objects in your parser to track source code locations, this also applies to yylloc.

该示例代码无法很好地检测错误,因为它不检查对 yypush_parse 的调用中的返回代码.我通常使用的一种解决方案是对 SEND 宏进行一些修改:

The example code does not do a good job of detecting errors, since it doesn't check return codes from the calls to yypush_parse. One solution I commonly use is some variant on the macro SEND:

#define SEND(token) do {                              \
  int status = yypush_parse(pstate, token, &yylval);  \
  if (status != YYPUSH_MORE) {                        \
    yypstate_delete(pstate);                          \
    return status;                                    \
  }                                                   \
} while (0)

也可以使用 goto 来避免 yypstate_delete return 的多个实例.YMMV.

It's also possible to use a goto to avoid the multiple instances of the yypstate_delete and return. YMMV.

您可能需要修改 yyerror 的原型.如果您正在使用位置和/或为push_parser提供额外的参数,则 yyerror 调用中还将出现位置对象和/或额外的参数.(错误字符串始终是 last 参数.)无论出于何种原因,解析器状态对象都 not 不提供给 yyerror ,这意味着 yyerror 函数不再可以访问诸如 yych 之类的变量,这些变量现在是 yypstate 结构的成员,而不是全局变量,因此如果您在错误报告中使用这些变量(实际上不建议这样做),那么您将不得不找到替代解决方案.

You may have to modify the prototype of yyerror. If you are using locations and/or providing extra parameters to the push_parser, the location object and/or the extra parameters will also be present in the yyerror call. (The error string is always the last parameter.) For whatever reason, the parser state object is not provided to yyerror, which means that the yyerror function no longer has access to variables such as yych, which are now members of the yypstate structure rather than being global variables, so if you use these variables in your error reporting (which is not really recommended practice), then you will have to find an alternative solution.

这篇关于如何一次一次灵活地返回多个终端的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆