精神Lex：哪个标记定义生成此标记？ [英] Spirit Lex: Which token definition generated this token?

查看：208 发布时间：2016/10/19 21:42:26 c++ boost boost-spirit boost-spirit-lex

本文介绍了精神Lex：哪个标记定义生成此标记？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对不起，如果这是一个新手问题，但我需要知道哪个令牌定义产生一个特定的令牌。当我打印令牌ID，我只是得到一个整数。我需要知道哪个正则表达式生成了此令牌。

Sorry if this is a newbie question, but I need to know which token definition produced a certain token. When I print the token ID, I just get an integer. I need to know which regex generated this token.

编辑：

我如何定义我的令牌：

   template <typename LexerT>
   class Tokens: public lex::lexer<LexerT>
   {
      public:
         Tokens(const std::string& input):
            lineNo_(1)
         {
            using boost::spirit::lex::_start;
            using boost::spirit::lex::_end;
            using boost::spirit::lex::_pass;
            using boost::phoenix::ref;
            using boost::phoenix::construct;

            // macros
            this->self.add_pattern
               ("EXP",     "(e|E)(\\+|-)?\\d+")
               ("SUFFIX",  "[yzafpnumkKMGTPEZY]")
               ("INTEGER", "-?\\d+")
               ("FLOAT",    "-?(((\\d+)|(\\d*\\.\\d+)|(\\d+\\.\\d*))({EXP}|{SUFFIX})?)")
               ("SYMBOL",  "[a-zA-Z_?@](\\w|\\?|@)*")
               ("STRING",  "\\\"([^\\\"]|\\\\\\\")*\\\"");

            // whitespaces and comments
            whitespaces_ = "\\s+";
            comments_    = "(;[^\\n]*\\n)|(\\/\\*[^*]*\\*+([^/*][^*]*\\*+)*\\/)";

            // literals
            integer_ = "{INTEGER}";
            float_   = "{FLOAT}";
            symbol_  = "{SYMBOL}";
            string_  = "{STRING}";

            // operators
            quote_         = "'";
            backquote_     = '`';

            // ... other tokens

            // whitespace and comment rules
            this->self += whitespaces_ [ref(lineNo_) += count(_start, _end, '\n'), _pass = lex::pass_flags::pass_ignore];
            this->self += comments_    [ref(lineNo_) += count(_start, _end, '\n'), _pass = lex::pass_flags::pass_ignore];

            // literal rules
            this->self += integer_ | float_ | string_ | symbol_;
            // this->self += ... other tokens
         }

         ~Tokens() {}

         size_t lineNo() { return lineNo_; }


      private:
         // ignored tokens
         lex::token_def<lex::omit> whitespaces_, comments_;

         // literal tokens
         lex::token_def<int>          integer_;
         lex::token_def<std::string>  float_, symbol_, string_;

         // operator tokens
         lex::token_def<> quote_, backquote_;
         // ... other token definitions of type lex::token_def<>

         // current line number
         size_t lineNo_;
   };

感谢，
Haitham

Thanks, Haitham

为了确保每个令牌都被分配一个ID，Spirit.Lex库在内部为令牌定义分配唯一的数字，从 boost :: spirit :: lex :: min_token_id

令牌ID被递增分配。然而，为了使事情更加友好/健壮，我建议使用帮助函数来确定令牌的名称，所以你可以这样做：

So you can in fact get the token id incrementally assigned. However, to make things a little bit more friendly/robust, I'd suggest makeing a helper function to determine the name of the token, so you can do something like this:

while (iter != end && token_is_valid(*iter))
{
    std::cout << "Token: " << 
       (iter->id() - lex::min_token_id) << ": " << 
       toklexer.nameof(iter) << " ('" << iter->value() << "')\n";
    ++iter;
}
if (iter == end) { std::cout << "lineNo: " << toklexer.lineNo() << "\n"; }

其中，对于输入：

const std::string str = "symbol \"string\" \n"
    "this /* is a comment */\n"
    "31415926E-7 123";

打印：

Token: 5: symbol_ ('symbol')
Token: 4: string_ ('"string"')
Token: 5: symbol_ ('this')
Token: 3: float_ ('31415926E-7')
Token: 2: integer_ ('123')
lineNo: 3

注释

我认为这不可能

我想我记得看到带有调试信息的令牌（类似于 qi :: rule<> :: name（）？）但是我目前找不到它的文档。如果您可以重用debug-name， Tokens :: nameof（It）函数的实现将大大简化。

I don't think it is possible to identify down to the pattern expression, since the information is not exposed and no longer available once the token is returned by the lexer
I think I remember seeing tokens with debug info (similar to qi::rule<>::name()?) but I can't currently find the documentation for it. The implementation of the Tokens::nameof(It) function would be greatly simplified if you could reuse the debug-name.

完全使用演示代码（略微适应Boost 1_49-1_57，GCC -std = c ++ 0x）：

Fully working demo code (slightly adapted to Boost 1_49-1_57, GCC -std=c++0x):

直播Coliru

#define BOOST_RESULT_OF_USE_DECLTYPE #define BOOST_SPIRIT_USE_PHOENIX_V3 #include <boost/spirit/include/qi.hpp> #include <boost/spirit/include/phoenix.hpp> #include <boost/spirit/include/lex_lexertl.hpp> #include <boost/phoenix/function/adapt_callable.hpp> namespace qi = boost::spirit::qi; namespace lex = boost::spirit::lex; namespace phx = boost::phoenix; /////////////////////////////////////////////////////////////////////////// // irrelevant for question: needed this locally to make it work with my boost // version namespace detail { struct count { template<class It1, class It2, class T> struct result { typedef ptrdiff_t type; }; template<class It1, class It2, class T> typename result<It1, It2, T>::type operator()(It1 f, It2 l, T const& x) const { return std::count(f, l, x); } }; } BOOST_PHOENIX_ADAPT_CALLABLE(count, detail::count, 3); /////////////////////////////////////////////////////////////////////////// template <typename LexerT> class Tokens: public lex::lexer<LexerT> { public: Tokens(): lineNo_(1) { using lex::_start; using lex::_end; using lex::_pass; using phx::ref; // macros this->self.add_pattern ("EXP", "(e|E)(\\+|-)?\\d+") ("SUFFIX", "[yzafpnumkKMGTPEZY]") ("INTEGER", "-?\\d+") ("FLOAT", "-?(((\\d+)|(\\d*\\.\\d+)|(\\d+\\.\\d*))({EXP}|{SUFFIX})?)") ("SYMBOL", "[a-zA-Z_?@](\\w|\\?|@)*") ("STRING", "\\\"([^\\\"]|\\\\\\\")*\\\""); // whitespaces and comments whitespaces_ = "\\s+"; comments_ = "(;[^\\n]*\\n)|(\\/\\*[^*]*\\*+([^/*][^*]*\\*+)*\\/)"; // literals integer_ = "{INTEGER}"; float_ = "{FLOAT}"; symbol_ = "{SYMBOL}"; string_ = "{STRING}"; // operators quote_ = "'"; backquote_ = '`'; // ... other tokens // whitespace and comment rules //this->self.add(whitespaces_, 1001) //(comments_, 1002); this->self = whitespaces_ [phx::ref(lineNo_) += count(_start, _end, '\n'), _pass = lex::pass_flags::pass_ignore] | comments_ [phx::ref(lineNo_) += count(_start, _end, '\n'), _pass = lex::pass_flags::pass_ignore]; // literal rules this->self += integer_ | float_ | string_ | symbol_; // this->self += ... other tokens } template <typename TokIter> std::string nameof(TokIter it) { if (it->id() == whitespaces_.id()) return "whitespaces_"; if (it->id() == comments_.id()) return "comments_"; if (it->id() == integer_.id()) return "integer_"; if (it->id() == float_.id()) return "float_"; if (it->id() == symbol_.id()) return "symbol_"; if (it->id() == string_.id()) return "string_"; if (it->id() == quote_.id()) return "quote_"; if (it->id() == backquote_.id()) return "backquote_"; return "other"; } ~Tokens() {} size_t lineNo() { return lineNo_; } private: // ignored tokens lex::token_def</*lex::omit*/> whitespaces_, comments_; // literal tokens lex::token_def<int> integer_; lex::token_def<std::string> float_, symbol_, string_; // operator tokens lex::token_def<> quote_, backquote_; // ... other token definitions of type lex::token_def<> // current line number size_t lineNo_; }; int main() { const std::string str = "symbol \"string\" \n" "this /* is a comment */\n" "31415926E-7 123"; typedef lex::lexertl::token<char const*> token_type; typedef lex::lexertl::actor_lexer<token_type> lexer_type; Tokens<lexer_type> toklexer; char const* first = str.c_str(); char const* last = &first[str.size()]; lexer_type::iterator_type iter = toklexer.begin(first, last); lexer_type::iterator_type end = toklexer.end(); while (iter != end && token_is_valid(*iter)) { std::cout << "Token: " << (iter->id() - lex::min_token_id) << ": " << toklexer.nameof(iter) << " ('" << iter->value() << "')\n"; ++iter; } if (iter == end) { std::cout << "lineNo: " << toklexer.lineNo() << "\n"; } else { std::string rest(first, last); std::cout << "Lexical analysis failed\n" << "stopped at: \"" << rest << "\"\n"; } return 0; }

这篇关于精神Lex：哪个标记定义生成此标记？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

精神Lex：哪个标记定义生成此标记？ [英] Spirit Lex: Which token definition generated this token?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

精神Lex：哪个标记定义生成此标记？ [英] Spirit Lex: Which token definition generated this token?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭