解析除关键字之外的标识符 [英] parsing identifiers except keywords

查看:98
本文介绍了解析除关键字之外的标识符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力编写一个标识符解析器,该解析器解析不是关键字的字母数字字符串. 关键字全部在表格中:

I am struggeling writing a identifier parser, which parses a alphanum string which is not a keyword. the keywords are all in a table:

struct keywords_t : x3::symbols<x3::unused_type> {
    keywords_t() {
        add("for", x3::unused)
                ("in", x3::unused)
                ("while", x3::unused);
    }
} const keywords;

,标识符的解析器应为:

and the parser for a identifier should be this:

auto const identifier_def =       
            x3::lexeme[
                (x3::alpha | '_') >> *(x3::alnum | '_')
            ];

现在我尝试将它们组合在一起,因此标识符解析器在解析关键字时失败. 我这样尝试过:

now i try to combine these so an identifier parser fails on parsing a keyword. I tried it like this:

auto const identifier_def =       
                x3::lexeme[
                    (x3::alpha | '_') >> *(x3::alnum | '_')
                ]-keywords;

和这个:

auto const identifier_def =       
                x3::lexeme[
                    (x3::alpha | '_') >> *(x3::alnum | '_') - keywords
                ];

它适用于大多数输入,但是如果字符串以诸如int, whilefoo, forbar之类的关键字开头,则解析器将无法解析此字符串. 我如何才能正确解析该解析器?

it works on most inputs but if a string starts with a keyword like like int, whilefoo, forbar the parser fails to parse this strings. how can i get this parser correct?

推荐答案

您的问题是由Spirit中差异运算符的语义引起的.当您拥有a - b精神时,将执行以下操作:

Your problem is caused by the semantics of the difference operator in Spirit. When you have a - b Spirit does the following:

  • 检查b是否匹配:
    • 如果是,则a - b失败并且没有任何解析.
    • 如果b失败,则检查a是否匹配:
      • 如果a失败,则a - b失败并且不解析任何内容.
      • 如果a成功,则a - b成功并解析任何a解析的内容.
      • check whether b matches:
        • if it does, a - b fails and nothing is parsed.
        • if b fails then it checks whether a matches:
          • if a fails, a - b fails and nothing is parsed.
          • if a succeeds, a - b succeeds and parses whatever a parses.

          在您的情况下(unchecked_identifier - keyword),只要标识符以关键字开头,keyword将匹配并且解析器将失败.因此,您需要将keyword交换为与某个匹配的东西交换,只要传递了一个不同的关键字,该交换便会匹配,但是只要该关键字后面跟着其他东西,交换它就会失败. not predicate(!)可以帮助您解决问题.

          In your case (unchecked_identifier - keyword) as long as the identifier starts with a keyword, keyword will match and your parser will fail. So you need to exchange keyword with something that matches whenever a distinct keyword is passed, but fails whenever the keyword is followed by something else. The not predicate (!) can help with that.

          auto const distinct_keyword = x3::lexeme[ keyword >> !(x3::alnum | '_') ];
          

          完整示例(在Coliru上运行):

          //#define BOOST_SPIRIT_X3_DEBUG
          #include <iostream>
          #include <boost/spirit/home/x3.hpp>
          
          namespace parser {
              namespace x3 = boost::spirit::x3;
          
              struct keywords_t : x3::symbols<x3::unused_type> {
                  keywords_t() {
                      add("for", x3::unused)
                              ("in", x3::unused)
                              ("while", x3::unused);
                  }
              } const keywords;
          
              x3::rule<struct identifier_tag,std::string>  const identifier ("identifier");
          
              auto const distinct_keyword = x3::lexeme[ keywords >> !(x3::alnum | '_') ];
              auto const unchecked_identifier = x3::lexeme[(x3::alpha | x3::char_('_')) >> *(x3::alnum | x3::char_('_'))];
          
          
              auto const identifier_def = unchecked_identifier - distinct_keyword;
          
              //This should also work:
              //auto const identifier_def = !distinct_keyword >> unchecked_identifier
          
          
              BOOST_SPIRIT_DEFINE(identifier);
          
              bool is_identifier(const std::string& input)
              {
                  auto iter = std::begin(input), end= std::end(input);
          
                  bool result = x3::phrase_parse(iter,end,identifier,x3::space);
          
                  return result && iter==end;
              }
          }
          
          
          
          int main() {
          
              std::cout << parser::is_identifier("fortran") << std::endl;
              std::cout << parser::is_identifier("for") << std::endl;
              std::cout << parser::is_identifier("integer") << std::endl;
              std::cout << parser::is_identifier("in") << std::endl;
              std::cout << parser::is_identifier("whileechoyote") << std::endl;
              std::cout << parser::is_identifier("while") << std::endl;
          }
          

          这篇关于解析除关键字之外的标识符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆