乐兴VHDL'(tick)令牌 [英] Lexing The VHDL ' (tick) Token

查看:90
本文介绍了乐兴VHDL'(tick)令牌的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在VHDL中,'字符可用于封装字符令牌ie '.'或用作属性分隔符(类似于CPP的::令牌)ie string'("hello").

In VHDL it the ' character can be used to encapsulate a character token ie '.' or it can as an attribute separator (similarish to CPP's :: token) ie string'("hello").

解析包含字符ie string'('a','b','c')的属性名称时出现问题.在这种情况下,幼稚的词法分析器会错误地将第一个'('标记为字符,并且随后的所有实际字符都会被弄乱.

The issue comes up when parsing an attribute name containing a character ie string'('a','b','c'). In this case a naive lexer will incorrectly tokenize the first '(' as a character, and all of the following actual character will be messed up.

在comp.lang.vhdl的Google组中有一个线程从2007年开始问类似的问题 标题为 使'char变词" ,有用户偏见的答案

There is a thread in comp.lang.vhdl google group from 2007 which asks a similar question Titled "Lexing the ' char" that has an answer by user diogratia

        case '\'':                          /* IR1045 check */

            if (    last_token == DELIM_RIGHT_PAREN ||
                    last_token == DELIM_RIGHT_BRACKET ||
                    last_token == KEYWD_ALL ||
                    last_token == IDENTIFIER_TOKEN ||
                    last_token == STR_LIT_TOKEN ||
                    last_token == CHAR_LIT_TOKEN || ! (buff_ptr<BUFSIZ-2) )
                token_flag = DELIM_APOSTROPHE;
            else if (is_graphic_char(NEXT_CHAR) &&
                    line_buff[buff_ptr+2] == '\'') { CHARACTER_LITERAL:
                buff_ptr+= 3;               /* lead,trailing \' and char */
                last_token = CHAR_LIT_TOKEN;
                token_strlen = 3;
                return (last_token);
            }
            else token_flag = DELIM_APOSTROPHE;
            break;

请参阅问题报告IR1045: http://www.eda-twiki.org/isac/IRs-VHDL-93/IR1045.txt

See Issue Report IR1045: http://www.eda-twiki.org/isac/IRs-VHDL-93/IR1045.txt

从上面的代码片段中可以看到,最后一个标记可以是 捕获并用来使歧义类似:

As you can see from the above code fragment, the last token can be captured and used to di"sambiguate something like:

  foo <= std_logic_vector'('a','b','c');

没有大的前瞻性或回溯性.

without a large look ahead or backtracking.

但是,据我所知,flex不会跟踪已解析的最后一个令牌.

However, As far as I know, flex doesn't track the last token that was parsed.

无需手动跟踪上一个解析的令牌,是否有更好的方法来完成此词法分析任务?

如果有帮助,我正在使用IntelliJ GrammarKit.

I am using IntelliJ GrammarKit if that helps.

推荐答案

IR1045背后的想法是,能够识别单引号/撇号是否是字符文字的一部分,而无需在向前或向后追溯时错误的,尝试:

The idea behind IR1045 is to be able to tell whether a single quote/apostrophe is part of a character literal or not without looking ahead or backtracking when you're wrong, try:

library ieee;
use ieee.std_logic_1164.all;

entity foo is
    port (
        a:      in      std_logic;
        b:      out     std_logic_vector (3 downto 0)
    );
end entity;

architecture behave of foo is
    begin
    b <= std_logic_vector'('0','1','1','0')     when a = '1' else
         (others =>'0')                         when a = '0' else
         (others => 'X');
end architecture behave;

您愿意展望多远?

但是,对于VHDL,有一个可变的撇号和字符文字歧义消除的实际示例.

There is however a practical example of flex disambiguation of apostrophes and character literals for VHDL.

Nick Gasson的nvc使用flex,他在其中实施了Issue Report 1045解决方案.

Nick Gasson's nvc uses flex, in which he implemented an Issue Report 1045 solution.

请参见 nvc/src/lexer.l 已根据GPLv3许可.

See the nvc/src/lexer.l which is licensed under GPLv3.

搜索last_token:

Search for last_token:

#define TOKEN(t) return (last_token = (t))

#define TOKEN_LRM(t, lrm)                                       \
   if (standard() < lrm) {                                      \
      warn_at(&yylloc, "%s is a reserved word in VHDL-%s",      \
              yytext, standard_text(lrm));                      \
      return parse_id(yytext);                                  \
   }                                                            \
   else                                                         \
      return (last_token = (t));

用于检查它的附加功能:

An added function to check it:

static int resolve_ir1045(void);

static int last_token = -1;

这是:

%%

static int resolve_ir1045(void)
{
   // See here for discussion:
   //   http://www.eda-stds.org/isac/IRs-VHDL-93/IR1045.txt
   // The set of tokens that may precede a character literal is
   // disjoint from that which may precede a single tick token.

   switch (last_token) {
   case tRSQUARE:
   case tRPAREN:
   case tALL:
   case tID:
      // Cannot be a character literal
      return 0;
   default:
      return 1;
   }
}

自comp.lang.vhdl发布以来,IR1045的位置已更改

The IR1045 location has changed since the comp.lang.vhdl post it's now

http://www.eda-twiki.org /isac/IRs-VHDL-93/IR1045.txt

您还需要在lexer.l中搜索resolve_ir1045.

You'll also want to search for resolve_ir1045 in lexer.l.

static int resolve_ir1045(void);

{CHAR}            { if (resolve_ir1045()) {
                       yylval.s = strdup(yytext);
                       TOKEN(tID);

我们发现nvc使用该函数进行过滤以检测字符文字的第一个单引号.

Where we find nvc uses the function to filter detecting the first single quote of a character literal.

这最初是Ada问题. IR-1045从未被采用,而是被普遍使用.可能还有Ada flex词法分析器也显示出歧义.

This was originally an Ada issue. IR-1045 was never adopted but universally used. There are probably Ada flex lexers that also demonstrate disambiguation.

在Ada User Journal卷中讨论了消除歧义的要求从2006年9月开始在27页3号发表在PDF页面30和31(第27卷159和160页)上的<词法分析中,我们认为解决方案并不为人所知.

The requirement to disambiguate is discussed in Ada User Journal volume 27 number 3 from September 2006 in an article Lexical Analysis on PDF pages 30 and 31 (Volume 27 pages 159 and 160) where we see the solution is not well known.

有关字符文字不在单引号之前的评论是不准确的:

The comment that character literals do not precede a single quote is inaccurate:

entity ir1045 is
end entity;

architecture foo of ir1045 is
begin
THIS_PROCESS:
    process
        type twovalue is ('0', '1');  
        subtype string4 is string(1 to 4);
        attribute a: string4;
        attribute a of '1' : literal is "TRUE";
    begin
        assert THIS_PROCESS.'1''a /= "TRUE"
            report "'1''a /= ""TRUE"" is FALSE";
        report "This_PROCESS.'1''a'RIGHT = " &
            integer'image(This_PROCESS.'1''a'RIGHT);
        wait;
    end process;
end architecture;

第一次使用具有选定名称前缀且后缀为字符文字的属性表明了这种不准确性,第二次报告语句表明了这一点很重要:

The first use of an attribute with selected name prefix that has a suffix that is a character literal demonstrates the inaccuracy, the second report statement shows it can matter:

ghdl -a ir1045.vhdl
ghdl -e ir1045
ghdl -r ir1045
ir1045.vhdl:13:9:@0ms:(assertion error): '1''a /= "TRUE" is FALSE
ir1045.vhdl:15:9:@0ms:(report note): This_PROCESS.'1''a'RIGHT = 4

除了包含带有字符字面后缀的选定名称的属性名称前缀外,还要求属性规范在同一属性中装饰"已声明的实体(entity_class的属性,请参见IEEE Std 1076-2008 7.2属性规范).声明实体所在的声明性区域.

In addition to an attribute name prefix containing a selected name with a character literal suffix there's a requirement that an attribute specification 'decorate' a declared entity (of an entity_class, see IEEE Std 1076-2008 7.2 Attribute specification) in the same declarative region the entity is declared in.

此示例在语法和语义上都是有效的VHDL.您可能会注意到,nvc不允许使用实体类文字来修饰命名实体.那不是根据7.2.

This example is syntactically and semantically valid VHDL. You could note that nvc doesn't allow decorating a named entity with the entity class literal. That's not according to 7.2.

枚举文字在类型声明中声明,此处键入twovalue.具有至少一个字符文字作为枚举文字的枚举类型是字符类型(5.2.2.1).

Enumeration literals are declared in type declarations, here type twovalue. An enumerated type that has at least one character literal as an enumeration literal is a character type (5.2.2.1).

这篇关于乐兴VHDL'(tick)令牌的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆