如何使boost :: spirit解析器和lexer能够处理包含文件 [英] How to make boost::spirit parser and lexer being able to deal with include files

查看:95
本文介绍了如何使boost :: spirit解析器和lexer能够处理包含文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是什么都不做的词法分析器-它返回读取的字符串. 我希望对此进行扩展,以便能够处理类似C ++的include语句. 我可以想象如何做到这一点-但我想知道是否存在一些更简单或已经可用的方法. 如果必须这样做,我将实现自己的迭代器(传递给词法分析器).该迭代器将包含

This is a do-nothing lexer&parser -- it returns the string read. I would like to have this extended to be able to deal with a C++-like include statement. I can imagine how to do this -- but I would like to know if there is some easier or already available way. If I would have to do this, I would implement my own iterator (to be passed to the lexer). This iterator would contain

  • 字符串索引(可能使用-1表示end()迭代器)
  • 指向此字符串的指针

词法分析器在遇到一些include语句时会将文件插入到当前位置的字符串中,从而覆盖include语句. 你会怎么做?

The lexer on encountering some include statement would insert the file into the string at the current position overwriting the include statement. How would you do this?

这是我不做的词法分析器/解析器:

Here is my do-nothing lexer/parser:

#include <boost/phoenix.hpp>
#include <boost/bind.hpp>
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/qi.hpp>
#include <algorithm>
#include <iostream>
#include <string>
#include <utility>
#include <vector>

namespace lex     = boost::spirit::lex;
namespace qi      = boost::spirit::qi;
namespace phoenix = boost::phoenix;


template<typename Lexer>
class lexer:public lex::lexer<Lexer>
{   public:
    typedef lex::token_def<char> char_token_type;
    char_token_type m_sChar;
    //lex::token_def<lex::omit> m_sInclude;
    lexer(void)
        : m_sChar(".")//,
        //m_sInclude("^#include \"[^\"]*\"")
    {   this->self += m_sChar;
    }
};

template<typename Iterator>
class grammar : public qi::grammar<Iterator, std::string()>
{   public:
    qi::rule<Iterator, std::string()> m_sStart;
    template<typename Tokens>
    explicit grammar(Tokens const& tokens)
        : grammar::base_type(m_sStart)
    {   m_sStart %= *tokens.m_sChar >> qi::eoi;
    }
};


int main(int, char**)
{
    typedef lex::lexertl::token<std::string::const_iterator, boost::mpl::vector<char> > token_type;
    typedef lexer<lex::lexertl::actor_lexer<token_type> > expression_lexer_type;
    typedef expression_lexer_type::iterator_type expression_lexer_iterator_type;
    typedef grammar<expression_lexer_iterator_type> expression_grammar_type;

    expression_lexer_type lexer;
    expression_grammar_type grammar(lexer);
    const std::string s_ac = "this is a test\n\
#include \"test.dat\"\n\
";
    std::string s;
    auto pBegin = std::begin(s_ac);
        lex::tokenize_and_parse(pBegin, std::end(s_ac), lexer, grammar, s);
}

推荐答案

首先,存在基于Spirit的预处理器:如何使用boost :: spirit :: lex来实现include指令?)

Firstly, a preprocessor based on Spirit exists: Boost Wave (see also How do I implement include directives using boost::spirit::lex?)

第二,将include文件的内容插入到字符串值中"既无用(出于词法分析目的),又非常低效:

Secondly, "inserting the contents of a the include file into the string value" is both useless (for lexing purposes) and highly inefficient:

  • 这是没有用的,因为包含文件将形成一个令牌(!?),这意味着您的解析器无法对包含的内容进行操作
  • 这不是通用的,因为嵌套包含将不会以这种方式发生
  • 即使目标只是逐个/copy/将包含文件逐字复制到等效的输出流中,通过将内容完全复制到内存中,通过词法分析器将其复制到解析器中,然后仅进行流化处理,这样做的效率极低.出来.您可以使用最少的分配将输入流虹吸到输出流中.

我建议以下各项的任意组合:

I'd suggest any combination of the following:

  • 单独的关注点:不要将解析与解释混为一谈.因此,如果要解析include指令,则将返回include语句的表示形式,然后可以将其传递给解释它的代码

  • separate concerns: don't conflate parsing with interpreting. So, if you're gonna parse include directives, you'll return a representation of the include statements, that can be then be passed to code that interprets it

一种特殊的,更强的关注点分离情况是将包含处理移至预处理阶段.确实,自定义迭代器类型可以解决问题,但是我将在其之上构建词法分析器,因此词法分析器不必了解包含,而只需对源词法进行分类,而不必(必须)知道确切的来源

a special, stronger case of separation of concerns is to move the include-handling to a preprocessing stage. Indeed, a custom iterator type could do the trick, but I'd build the lexer on top of it, so the lexer doesn't have to know about includes, instead just lexing the source, without (having to) know the exact origin.

这篇关于如何使boost :: spirit解析器和lexer能够处理包含文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆