如何在野牛中翻译令牌名称 [英] How to translate token names in bison

查看:86
本文介绍了如何在野牛中翻译令牌名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个野牛解析器,可以很好地满足我的目的.它甚至会打印本地化的错误消息.但是令牌名称未翻译.查看我发现的源代码,我可以对我自己的gettext函数使用define YY_,并将YY_传递给gettext,以便提供我自己的错误消息翻译.但这不适用于令牌名称.

I have a bison parser that works sufficiently well for my purpose. It even prints localized error messages. But the token names are not translated. Looking at the source code I found, that I can use define YY_ to my own gettext function and pass YY_ to gettext in order to provide my own translation of the error messages. But this does not work for token names.

是否有一些开关或隐藏功能可用于从解析器中提取令牌名称并进行翻译?

Is there some switch or hidden feature that I could use to extract the token names from the parser and to translate them?

到目前为止,我发现yytnamerr可以被覆盖以格式化令牌名称.因为它不仅仅可以重新格式化名称,所以我不喜欢触摸此功能,因为我必须将其与Bison的进度同步.另一方面,我还需要一种从解析器中提取令牌名称的方法,以便将其添加到语言定义文件中.

So far I found yytnamerr which could be overridden to format the token names. As it does more than just reformat names I don't like to touch this function, as I would have to sync it with the progress of Bison. On the other hand, I need also a way to extract the token names from the parser in order to add them to the language definition file.

您如何使用Bison实施用户友好的错误报告?

How do you implement user friendly error reporting with Bison?

推荐答案

如果指定%token-table,则野牛将生成yytname表.该表包括所有野牛符号,包括内部符号($end$error$undefined),终端(已命名,单引号字符和双引号字符串)以及非终端(还包括生成的规则中动作的名称.

If you specify %token-table, then bison will generate the yytname table. This table includes all bison symbols, including internal symbols ($end, $error and $undefined), terminals -- named, single-quoted characters and double-quoted strings -- and non-terminals, which include also the generated names for mid-rule actions.

在可见yytname的情况下,很容易以gettext包可识别的格式提取令牌.例如,您可以将以下内容添加到您的.y文件中:

With yytname visible, it's easy to extract the tokens in a format recognizable by the gettext package. For example, you could add to your .y file something like this:

#ifdef MAKE_TOKEN
int main(void) {
   puts("#include <libintl.h>");
   puts("#include <stdio.h>");
   puts("int main() {");
   for (const char* const* p = yytname; *p; ++p) {
     // See Note 1 below
     printf("  printf(\"%%s: %%s\\n\", \"%s\", gettext (\"%s\"));\n", *p, *p);
   }
   puts("}");
 }
 #endif

,然后在您的Makefile中添加一个节(对文件名进行适当的替换):

and then add a stanza to your Makefile (making appropriate substitutions for file names):

messages.pot: my_parser.c
    $(CC) $(CFLAGS) -DMAKE_TOKEN -o token_lister $<
    ./token_lister > my_parser.tokens.c
    # See Note 2 below
    $(CC) -o my_parser.tokens my_parser.tokens.c
    xgettext -o $@ my_parser.tokens.c

一旦有了翻译,您仍然需要弄清楚如何使用它们,因为bison不提供用于将翻译的令牌名称插入其生成的错误消息中的接口.可能最简单的方法是,通过遍历该数组并用其翻译替换每个令牌名称,将翻译直接插入到yytname中(这必须在解析器启动时完成).这带来了由野牛骨架将yytname声明为const的烦恼;但是,可以使用非常简单的sedawk调用来删除有问题的const. [注3]

Once you have the translations, you still need to figure out how to use them, since bison does not offer an interface for inserting translated token names into its generated error messages. Probably the simplest way is to insert the translations directly into yytname by iterating through that array and substituting each token name with its translation (that would have to be done at parser startup). That presents the annoyance that yytname is declared const by the bison skeleton; however, a very simple sed or awk invocation can be used to remove the offending const. [Note 3]

话虽如此,但我仍然不清楚这些自动生成的错误消息是否对用户友好",除非用户出乎意料地熟悉该语言的形式语法.而且,熟悉语法的用户可能会更喜欢原始标记名称,以便在语法中找到它,而不是只巧合原始概念的非专家翻译.不是说我要特别指责任何人.

Having said that, it's not at all clear to me that these automatically generated error messages are "user friendly", unless the user is surprisingly familiar with the language's formal grammar. And a user who is familiar with the grammar might well prefer the original token name, in order to find it in the grammar, rather than a non-expert translation which only coincidentally resembles the original concept. Not that I'm pointing fingers at anyone in particular.

您可能会喜欢Russ Cox撰写的引人入胜的文章,它讲述了他如何为Go实施实际上友好的错误消息

You might enjoy this fascinating essay by Russ Cox, about how he implemented actually friendly error messages for Go.

注释:

  1. 在表示形式为"\的令牌的情况下,不能直接在C字符串中直接使用令牌名称.特别是,任何关键字标记("and""<=")都会失败,单个字符标记'"''\\'也会失败.这些在语法中很少出现.如果您要在扫描仪中替换国际化的关键字,则根本不可能使用bison的带引号的字符串功能.

  1. The direct use of the token name in a C string won't work in the case of the tokens whose representation includes a " or a \. In particular, any keyword token ("and" or "<=") will fail, as will the single character tokens '"' and '\\'. These don't show up very often in grammars; if you're substituting internationalized keywords in your scanner, you're very unlikely to use bison's quoted string feature at all.

如果您确实想使用此类令牌,​​则必须输出gettext生成器的代码,该生成器将令牌名称中的"\字符转义.

If you do want to use such tokens, you'll have to output code for the gettext generator which escapes " and \ characters in the token name.

实际上,最好使用多个节,但是我认为一个足以使您继续前进.您可能要将部分或全部中间结果标记为.INTERMEDIATE.生成的可执行文件my_parser.tokens可用于验证翻译,但这是完全可选的,因此您可能要删除该行.另一方面,它会验证字符串是否可编译.

Actually, it would be better to use several stanzas, but that one is enough to get you going, I think. You probably want to mark some or all of the intermediate results as .INTERMEDIATE. The generated executable my_parser.tokens can be used to verify the translations, but that's totally optional, so you might want to remove that line. On the other hand, it does verify that the strings are compilable.

有关示例,请参阅Russ Cox的gc(上面提供的链接).他的Makefile修改了野牛输出,从yytname中删除了const,以便生成的解析器可以将他的首选标记名替换为错误消息,从而可以看到工作的一般思路.

See Russ Cox's gc (link provided above) for an example. His Makefile modifies the bison output to remove the const from yytname, so that the generated parser can substitute his preferred token names for error messages, so you can see the general idea at work.

这篇关于如何在野牛中翻译令牌名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆