提振精神信号成功解析尽管令牌不完整 [英] Boost Spirit Signals Successful Parsing Despite Token Being Incomplete

查看:190
本文介绍了提振精神信号成功解析尽管令牌不完整的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常简单的路径构造我试图用升压spirit.lex解析。

我们有以下的语法:

 标记:= [A-Z] +
路径:=(令牌:路径)| (令牌)

所以,我们只是在谈论这里冒号分隔的小写的ASCII字符串。

我有三个例子XYZ,ABC:XYZ,ABC:XYZ:

前两个应视为有效。第三个,其具有尾随结肠,不应该被视为无效。不幸的是,解析器我承认这三个为​​有效。语法不应该允许一个空的道理,但显然精神正是这样做的。我错过拿到第三个拒绝?

另外,如果你看了下面的code,在注释中有是要求所有的路径用分号结束分析器的另一个版本。我能得到适当的行为,当我激活这些行,(即拒绝ABC:XYZ :;的),但是这不是我真正想要什么。

任何人有什么想法?

感谢。

 的#include<升压/配置/ warning_disable.hpp>
#包括LT&;升压/精神/有/ qi.hpp>
#包括LT&;升压/精神/有/ lex_lexertl.hpp>
#包括LT&;升压/精神/有/ phoenix_operator.hpp>#包括LT&;&iostream的GT;
#包括LT&;串GT;使用空间boost ::精神;
使用boost ::凤凰:: VAL;模板< typename的词法>
结构PathTokens:提振精神:: :: ::法词法<&词法GT;
{
      PathTokens()
      {
         标识符=[A-Z] +;
         隔板=:;         这 - > self.add
            (标识符)
            (分隔器)
            (';')
            ;
      }
      的boost ::精神:: ::法与token_def LT;标准::字符串>识别,分离;
};
模板< typename的迭代器>
结构PathGrammar
   :提振精神:: ::气::语法<&迭代器GT;
{
      模板< typename的TokenDef>
      PathGrammar(TokenDef常量和放大器;托克)
         :PathGrammar :: base_type(路径)
      {
         使用boost ::精神:: _ VAL;
         路径
            =
            (令牌GT;> tok.separator>>路径)的std :: CERR<< _1<< \\ n]
            |
            //(令牌GT;>';')的std :: CERR<< _1<< \\ n]
            (令牌)的std :: CERR<< _1<< \\ n]
             ;          象征
             =(tok.identifier)_val = _1]
          ;      }
      提振精神:: ::气::规则<&迭代器GT;路径;
      提振精神:: ::气::规则<迭代器,标准::字符串()>令牌;
};
诠释的main()
{
   的typedef的std ::字符串:迭代BaseIteratorType;
   类型定义的boost ::精神:: ::法:: lexertl令牌LT; BaseIteratorType,提振:: MPL ::矢量<标准::字符串> > TokenType;
   类型定义的boost ::精神:: ::法:: lexertl词法< TokenType> LexerType;
   typedef的PathTokens< LexerType> :: iterator_type TokensIterator;
   的typedef的std ::矢量<标准::字符串>测试;   测试路径;
   paths.push_back(ABC);
   paths.push_back(ABC:XYZ);
   paths.push_back(ABC:XYZ:);
   / *
     paths.clear();
     paths.push_back(ABC;);
     paths.push_back(ABC:XYZ;);
     paths.push_back(ABC:XYZ :;);
   * /
   对于(测试::迭代器ITER = paths.begin(!); ITER = paths.end(); ++ ITER)
   {
      性病::字符串str = * ITER;
      的std :: CERR<< *****&所述;&下; STR<< ***** \\ n;      PathTokens< LexerType>令牌;
      PathGrammar< TokensIterator>语法(标记);      BaseIteratorType第一= str.begin();
      BaseIteratorType最后= str.end();      BOOL R =的boost ::精神:: ::法tokenize_and_parse(第一,最后,代币,语法);      的std :: CERR<< R<< << (第一==最后)LT;< \\ n;
   }
}


解决方案

问题就出在第一末页您的来电 tokenize_and_parse 之后。 第一==最后一个检查,如果你的字符串已经完全符号化,你不能推断出语法什么。如果你隔离这样的解析,您将获得预期的结果:

  PathTokens< LexerType>令牌;
  PathGrammar< TokensIterator>语法(标记);  BaseIteratorType第一= str.begin();
  BaseIteratorType最后= str.end();  LexerType :: iterator_type lexfirst = tokens.begin(第一,最后);
  LexerType :: iterator_type lexlast = tokens.end();
  BOOL R =解析(lexfirst,lexlast,语法);  的std :: CERR<< R<< << (lexfirst == lexlast)LT;< \\ n;

I have a very simple path construct that I am trying to parse with boost spirit.lex.

We have the following grammar:

token := [a-z]+
path := (token : path) | (token)

So we're just talking about colon separated lower-case ASCII strings here.

I have three examples "xyz", "abc:xyz", "abc:xyz:".

The first two should be deemed valid. The third one, which has a trailing colon, should not be deemed valid. Unfortunately the parser I have recognizes all three as being valid. The grammar should not allow an empty token, but apparently spirit is doing just that. What am I missing to get the third one rejected?

Also, if you read the code below, in comments there is another version of the parser that demands that all paths end with semi-colons. I can get appropriate behavior when I activate those lines, (i.e. rejection of "abc:xyz:;"), but this is not really what I want.

Anyone have any ideas?

Thanks.

#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>

#include <iostream>
#include <string>

using namespace boost::spirit;
using boost::phoenix::val;

template<typename Lexer>
struct PathTokens : boost::spirit::lex::lexer<Lexer>
{
      PathTokens()
      {
         identifier = "[a-z]+";
         separator = ":";

         this->self.add
            (identifier)
            (separator)
            (';')
            ;
      }
      boost::spirit::lex::token_def<std::string> identifier, separator;
};


template <typename Iterator>
struct PathGrammar 
   : boost::spirit::qi::grammar<Iterator> 
{
      template <typename TokenDef>
      PathGrammar(TokenDef const& tok)
         : PathGrammar::base_type(path)
      {
         using boost::spirit::_val;
         path
            = 
            (token >> tok.separator >> path)[std::cerr << _1 << "\n"]
            |
            //(token >> ';')[std::cerr << _1 << "\n"]
            (token)[std::cerr << _1 << "\n"]
             ; 

          token 
             = (tok.identifier) [_val=_1]
          ;

      }
      boost::spirit::qi::rule<Iterator> path;
      boost::spirit::qi::rule<Iterator, std::string()> token;
};


int main()
{
   typedef std::string::iterator BaseIteratorType;
   typedef boost::spirit::lex::lexertl::token<BaseIteratorType, boost::mpl::vector<std::string> > TokenType;
   typedef boost::spirit::lex::lexertl::lexer<TokenType> LexerType;
   typedef PathTokens<LexerType>::iterator_type TokensIterator;
   typedef std::vector<std::string> Tests;

   Tests paths;
   paths.push_back("abc");
   paths.push_back("abc:xyz");
   paths.push_back("abc:xyz:");
   /*
     paths.clear();
     paths.push_back("abc;");
     paths.push_back("abc:xyz;");
     paths.push_back("abc:xyz:;");
   */
   for ( Tests::iterator iter = paths.begin(); iter != paths.end(); ++iter )
   {
      std::string str = *iter;
      std::cerr << "*****" << str << "*****\n";

      PathTokens<LexerType> tokens;
      PathGrammar<TokensIterator> grammar(tokens);

      BaseIteratorType first = str.begin();
      BaseIteratorType last = str.end();

      bool r = boost::spirit::lex::tokenize_and_parse(first, last, tokens, grammar);

      std::cerr << r << " " << (first==last) << "\n";
   }
}

解决方案

The problem lies in the meaning of first and last after your call to tokenize_and_parse. first==last checks if your string has been completely tokenized, you can't infer anything about grammar. If you isolate the parsing like this, you obtain the expected result:

  PathTokens<LexerType> tokens;
  PathGrammar<TokensIterator> grammar(tokens);

  BaseIteratorType first = str.begin();
  BaseIteratorType last = str.end();

  LexerType::iterator_type lexfirst = tokens.begin(first,last);
  LexerType::iterator_type lexlast = tokens.end();


  bool r = parse(lexfirst, lexlast, grammar);

  std::cerr << r << " " << (lexfirst==lexlast) << "\n";

这篇关于提振精神信号成功解析尽管令牌不完整的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆