使用Boost.Spirit Qi和Lex时的空白跳线 [英] Whitespace skipper when using Boost.Spirit Qi and Lex

查看:103
本文介绍了使用Boost.Spirit Qi和Lex时的空白跳线的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们考虑以下代码:

#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/qi.hpp>
#include <algorithm>
#include <iostream>
#include <string>
#include <utility>
#include <vector>

namespace lex = boost::spirit::lex;
namespace qi = boost::spirit::qi;

template<typename Lexer>
class expression_lexer
    : public lex::lexer<Lexer>
{
public:
    typedef lex::token_def<> operator_token_type;
    typedef lex::token_def<> value_token_type;
    typedef lex::token_def<> variable_token_type;
    typedef lex::token_def<lex::omit> parenthesis_token_type;
    typedef std::pair<parenthesis_token_type, parenthesis_token_type> parenthesis_token_pair_type;
    typedef lex::token_def<lex::omit> whitespace_token_type;

    expression_lexer()
        : operator_add('+'),
          operator_sub('-'),
          operator_mul("[x*]"),
          operator_div("[:/]"),
          value("\\d+(\\.\\d+)?"),
          variable("%(\\w+)"),
          parenthesis({
            std::make_pair(parenthesis_token_type('('), parenthesis_token_type(')')),
            std::make_pair(parenthesis_token_type('['), parenthesis_token_type(']'))
          }),
          whitespace("[ \\t]+")
    {
        this->self
            = operator_add
            | operator_sub
            | operator_mul
            | operator_div
            | value
            | variable
            ;

        std::for_each(parenthesis.cbegin(), parenthesis.cend(),
            [&](parenthesis_token_pair_type const& token_pair)
            {
                this->self += token_pair.first | token_pair.second;
            }
        );

        this->self("WS") = whitespace;
    }

    operator_token_type operator_add;
    operator_token_type operator_sub;
    operator_token_type operator_mul;
    operator_token_type operator_div;

    value_token_type value;
    variable_token_type variable;

    std::vector<parenthesis_token_pair_type> parenthesis;

    whitespace_token_type whitespace;
};

template<typename Iterator, typename Skipper>
class expression_grammar
    : public qi::grammar<Iterator, Skipper>
{
public:
    template<typename Tokens>
    explicit expression_grammar(Tokens const& tokens)
        : expression_grammar::base_type(start)
    {
        start                     %= expression >> qi::eoi;

        expression                %= sum_operand >> -(sum_operator >> expression);
        sum_operator              %= tokens.operator_add | tokens.operator_sub;
        sum_operand               %= fac_operand >> -(fac_operator >> sum_operand);
        fac_operator              %= tokens.operator_mul | tokens.operator_div;

        if(!tokens.parenthesis.empty())
            fac_operand           %= parenthesised | terminal;
        else
            fac_operand           %= terminal;

        terminal                  %= tokens.value | tokens.variable;

        if(!tokens.parenthesis.empty())
        {
            parenthesised         %= tokens.parenthesis.front().first >> expression >> tokens.parenthesis.front().second;
            std::for_each(tokens.parenthesis.cbegin() + 1, tokens.parenthesis.cend(),
                [&](typename Tokens::parenthesis_token_pair_type const& token_pair)
                {
                    parenthesised %= parenthesised.copy() | (token_pair.first >> expression >> token_pair.second);
                }
            );
        }
    }

private:
    qi::rule<Iterator, Skipper> start;
    qi::rule<Iterator, Skipper> expression;
    qi::rule<Iterator, Skipper> sum_operand;
    qi::rule<Iterator, Skipper> sum_operator;
    qi::rule<Iterator, Skipper> fac_operand;
    qi::rule<Iterator, Skipper> fac_operator;
    qi::rule<Iterator, Skipper> terminal;
    qi::rule<Iterator, Skipper> parenthesised;
};


int main()
{
    typedef lex::lexertl::token<std::string::const_iterator> token_type;
    typedef expression_lexer<lex::lexertl::lexer<token_type>> expression_lexer_type;
    typedef expression_lexer_type::iterator_type expression_lexer_iterator_type;
    typedef qi::in_state_skipper<expression_lexer_type::lexer_def> skipper_type;
    typedef expression_grammar<expression_lexer_iterator_type, skipper_type> expression_grammar_type;

    expression_lexer_type lexer;
    expression_grammar_type grammar(lexer);

    while(std::cin)
    {
        std::string line;
        std::getline(std::cin, line);

        std::string::const_iterator first = line.begin();
        std::string::const_iterator const last = line.end();

        bool const result = lex::tokenize_and_phrase_parse(first, last, lexer, grammar, qi::in_state("WS")[lexer.self]);
        if(!result)
            std::cout << "Parsing failed! Reminder: >" << std::string(first, last) << "<" << std::endl;
        else
        {
            if(first != last)
                std::cout << "Parsing succeeded! Reminder: >" << std::string(first, last) << "<" << std::endl;
            else
                std::cout << "Parsing succeeded!" << std::endl;
        }
    }
}

它是具有值和变量的算术表达式的简单解析器.它是使用expression_lexer提取令牌,然后使用expression_grammar解析令牌构建的.

It is a simple parser for arithmetic expressions with values and variables. It is build using expression_lexer for extracting tokens, and then with expression_grammar to parse the tokens.

在这么小的情况下使用lexer似乎是一个过大的杀伤力,可能是一个.但这就是简化示例的代价.还要注意,使用lexer可以轻松地使用正则表达式定义令牌,而可以通过外部代码(尤其是用户提供的配置)轻松地定义令牌.使用提供的示例,从外部配置文件中读取令牌的定义根本就没有问题,例如,允许用户将变量从%name更改为$name.

Use of lexer for such a small case might seem an overkill and probably is one. But that is the cost of simplified example. Also note that use of lexer allows to easily define tokens with regular expression while that allows to easily define them by external code (and user provided configuration in particular). With the example provided it would be no issue at all to read definition of tokens from an external config file and for example allow user to change variables from %name to $name.

该代码似乎运行正常(已在Visual Studio 2013中使用Boost 1.61进行了检查).除非我已经注意到,如果我提供像5++5这样的字符串,它会正常失败,但是仅作为提醒报告5而不是+5,这意味着违规的+被不可恢复"地消耗了.显然,生成的但与语法不匹配的标记绝不会返回到原始输入.但这不是我要问的.只是我在检查代码时意识到的便笺.

The code seems to be working fine (checked on Visual Studio 2013 with Boost 1.61). Except that I have noticed that if I provide string like 5++5 it properly fails but reports as reminder just 5 rather than +5 which means the offending + was "unrecoverably" consumed. Apparently a token that was produced but did not match grammar is in no way returned to the original input. But that is not what I'm asking about. Just a side note I realized when checking the code.

现在问题出在空白跳过上.我非常不喜欢它的完成方式.虽然我是用这种方式完成的,但它似乎是许多示例所提供的一种,其中包括有关StackOverflow问题的答案.

Now the problem is with whitespace skipping. I very much don't like how it is done. While I have done it this way as it seems to be the one provided by many examples including answers to questions here on StackOverflow.

最糟糕的事情似乎是(没有文献记录?)qi::in_state_skipper.同样,似乎我必须像这样添加whitespace令牌(带有名称),而不是像其他所有令牌一样添加令牌,因为使用lexer.whitespace而不是"WS"似乎不起作用.

The worst thing seems to be that (nowhere documented?) qi::in_state_skipper. Also it seems that I have to add the whitespace token like that (with a name) rather than like all the other ones as using lexer.whitespace instead of "WS" doesn't seem to work.

最后不得不用Skipper参数杂乱"语法似乎不太好.我不应该摆脱它吗?毕竟,我想基于标记而不是直接输入来创建语法,并且希望将空白从标记流中排除-在那里不再需要它了!

And finally having to "clutter" the grammar with the Skipper argument doesn't seem nice. Shouldn't I be free of it? After all I want to make the grammar based on tokens rather than direct input and I want the whitespace to be excluded from tokens stream - it is not needed there anymore!

我还需要跳过空格的哪些其他选项?像现在这样进行操作有什么优势?

What other options do I have to skip whitespaces? What are advantages of doing it like it is now?

推荐答案

仅出于某些奇怪的原因,我现在发现了一个不同的问题,其他解决方案提供了到空白的跳过.更好的一个!

For some strange reason only now I found a different question, Boost.Spirit SQL grammar/lexer failure, where some other solution to whitespace skipping is provided. A better one!

因此,下面的示例代码根据其中的建议进行了重新设计:

So below is the example code reworked along the suggestions there:

#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/qi.hpp>
#include <algorithm>
#include <iostream>
#include <string>
#include <utility>
#include <vector>

namespace lex = boost::spirit::lex;
namespace qi = boost::spirit::qi;

template<typename Lexer>
class expression_lexer
    : public lex::lexer<Lexer>
{
public:
    typedef lex::token_def<> operator_token_type;
    typedef lex::token_def<> value_token_type;
    typedef lex::token_def<> variable_token_type;
    typedef lex::token_def<lex::omit> parenthesis_token_type;
    typedef std::pair<parenthesis_token_type, parenthesis_token_type> parenthesis_token_pair_type;
    typedef lex::token_def<lex::omit> whitespace_token_type;

    expression_lexer()
        : operator_add('+'),
          operator_sub('-'),
          operator_mul("[x*]"),
          operator_div("[:/]"),
          value("\\d+(\\.\\d+)?"),
          variable("%(\\w+)"),
          parenthesis({
            std::make_pair(parenthesis_token_type('('), parenthesis_token_type(')')),
            std::make_pair(parenthesis_token_type('['), parenthesis_token_type(']'))
          }),
          whitespace("[ \\t]+")
    {
        this->self
            += operator_add
            | operator_sub
            | operator_mul
            | operator_div
            | value
            | variable
            | whitespace [lex::_pass = lex::pass_flags::pass_ignore]
            ;

        std::for_each(parenthesis.cbegin(), parenthesis.cend(),
            [&](parenthesis_token_pair_type const& token_pair)
            {
                this->self += token_pair.first | token_pair.second;
            }
        );
    }

    operator_token_type operator_add;
    operator_token_type operator_sub;
    operator_token_type operator_mul;
    operator_token_type operator_div;

    value_token_type value;
    variable_token_type variable;

    std::vector<parenthesis_token_pair_type> parenthesis;

    whitespace_token_type whitespace;
};

template<typename Iterator>
class expression_grammar
    : public qi::grammar<Iterator>
{
public:
    template<typename Tokens>
    explicit expression_grammar(Tokens const& tokens)
        : expression_grammar::base_type(start)
    {
        start                     %= expression >> qi::eoi;

        expression                %= sum_operand >> -(sum_operator >> expression);
        sum_operator              %= tokens.operator_add | tokens.operator_sub;
        sum_operand               %= fac_operand >> -(fac_operator >> sum_operand);
        fac_operator              %= tokens.operator_mul | tokens.operator_div;

        if(!tokens.parenthesis.empty())
            fac_operand           %= parenthesised | terminal;
        else
            fac_operand           %= terminal;

        terminal                  %= tokens.value | tokens.variable;

        if(!tokens.parenthesis.empty())
        {
            parenthesised         %= tokens.parenthesis.front().first >> expression >> tokens.parenthesis.front().second;
            std::for_each(tokens.parenthesis.cbegin() + 1, tokens.parenthesis.cend(),
                [&](typename Tokens::parenthesis_token_pair_type const& token_pair)
                {
                    parenthesised %= parenthesised.copy() | (token_pair.first >> expression >> token_pair.second);
                }
            );
        }
    }

private:
    qi::rule<Iterator> start;
    qi::rule<Iterator> expression;
    qi::rule<Iterator> sum_operand;
    qi::rule<Iterator> sum_operator;
    qi::rule<Iterator> fac_operand;
    qi::rule<Iterator> fac_operator;
    qi::rule<Iterator> terminal;
    qi::rule<Iterator> parenthesised;
};


int main()
{
    typedef lex::lexertl::token<std::string::const_iterator> token_type;
    typedef expression_lexer<lex::lexertl::actor_lexer<token_type>> expression_lexer_type;
    typedef expression_lexer_type::iterator_type expression_lexer_iterator_type;
    typedef expression_grammar<expression_lexer_iterator_type> expression_grammar_type;

    expression_lexer_type lexer;
    expression_grammar_type grammar(lexer);

    while(std::cin)
    {
        std::string line;
        std::getline(std::cin, line);

        std::string::const_iterator first = line.begin();
        std::string::const_iterator const last = line.end();

        bool const result = lex::tokenize_and_parse(first, last, lexer, grammar);
        if(!result)
            std::cout << "Parsing failed! Reminder: >" << std::string(first, last) << "<" << std::endl;
        else
        {
            if(first != last)
                std::cout << "Parsing succeeded! Reminder: >" << std::string(first, last) << "<" << std::endl;
            else
                std::cout << "Parsing succeeded!" << std::endl;
        }
    }
}

区别如下:

  1. whitespace令牌与所有其他令牌一样被添加到lexer的self中.
  2. 但是,有一个动作与之关联.该操作使词法分析器忽略令牌.正是我们想要的.
  3. 我的expression_grammar不再使用Skipper模板参数.因此,它也从规则中删除.
  4. 使用
  5. lex::lexertl::actor_lexer代替lex::lexertl::lexer,因为现在有一个与令牌关联的动作.
  6. 我打电话给tokenize_and_parse而不是tokenize_and_phrase_parse,因为我不再需要传递船长了.
  7. 我还把词法分析器中的第一个赋值从=更改为+=,因为它似乎更灵活(可以抵抗顺序更改).但这不会影响此处的解决方案.
  1. whitespace token is added to lexer's self as all other tokens.
  2. However, an action is associated with it. The action makes the lexer ignore the token. Which is exactly what we want.
  3. My expression_grammar no longer takes Skipper template argument. And so it is also removed from rules.
  4. lex::lexertl::actor_lexer is used instead of lex::lexertl::lexer since now there is an action associated with a token.
  5. I'm calling tokenize_and_parse instead of tokenize_and_phrase_parse as I don't need to pass skipper anymore.
  6. Also I changed first assignment to this->self in lexer from = to += as it seems more flexible (resistant to order changes). But it doesn't affect the solution here.

我对此很满意.它完美地满足了我的需求(或者说我的口味).但是,我想知道这种变化是否还有其他后果吗?在某些情况下会优先使用任何方法吗?我不知道.

I'm good with this. It suites my needs (or better to say my taste) perfectly. However I wonder whether there are any other consequences of such change? Is any approach preferred in some situations? That I don't know.

这篇关于使用Boost.Spirit Qi和Lex时的空白跳线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆