如何为用户提供给定boost :: spirit语法的自动完成建议? [英] How to provider user with autocomplete suggestions for given boost::spirit grammar?

查看:64
本文介绍了如何为用户提供给定boost :: spirit语法的自动完成建议?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Boost :: Spirit在我的C ++ GUI应用程序中为非技术用户构建简单的数据过滤器"语言.语言与普通英语非常相似,可以解析为AST.我被要求使该过程尽可能地易于使用,因此我希望提供类似CLang的错误消息(无法识别的'tokken',您的意思是'token'?")和自动补全功能.

I am using Boost::Spirit to build simple "data-filter" language in my C++ GUI application for non-technical users. Language is very similiar to plain English and parse-able into AST. I am requested to make the process as user-friendly as possible, so I wish to provide CLang-like error messages ("Unrecognized 'tokken', did you meant 'token'?") and autocompletion.

当前的问题是如何使用Boost :: Spirit语法为两个目标生成可能的令牌列表(我将使用简单的字符串距离算法来满足第一个要求)?

The question in place is how to use Boost::Spirit grammar to generate possible token list for both goals (I will use simple string distance algorithm to satisfy first requirement)?

到目前为止我的想法:

  • 将源流中的位置和长度添加到从解析器获得的令牌定义中.
  • 在语法中添加特殊的无效标记,该标记将匹配任何内容...嗯...无效:-)
  • 如果用户按下ctrl +空格,则构建AST(使用无效令牌,该树将始终可构建),然后在当前光标位置内查找令牌
  • 在所有可能的令牌上使用简单的匹配器(毕竟我有令牌列表)并准备建议列表

此解决方案的问题是,该建议还将建议对于给定位置无效的令牌.而且,如果我添加(并且将添加)可定义的标识符,那么我手头的问题就更大了……

The problem with this solution is the suggestion will also suggest tokens that are invalid for given place. And if I add (and I will) definable identifiers, I have much bigger problem in hand...

另一个限制:我只想在一种地方定义这种语言的语法;如果语法发生变化,我想让自动完成程序在重新编译后意识到这一变化

One more constraint: I want to have grammar for this language defined in only one place; If the grammar changes, I want to autocompleter be aware of this changes after recompilation

推荐答案

出于好奇,我决定尝试这个概念.

Out of curiosity, I decided to try the concept.

这是我的尝试.

让我们通过函数调用来解析算术表达式.

Let's parse arithmetic expressions with function calls.

现在,我们想使用可能的未知标识符来解析(部分)表达式.

Now, we want to parse (partial) expression with possible unknown identifiers.

在表达式不完整的情况下,我们希望暗示"最小标记以完成它(并可能继续解析).

In case of incomplete expressions, we want to "imply" the minimal token to complete it (and potentially continue parsing).

在标识符未知的情况下,我们希望对域中的已知标识符(变量或函数)进行模糊匹配,并按概率递减的顺序对其进行排名.

In case of unknown identifiers, we want to fuzzy-match the known identifiers in the domain (either variables or functions) and rank them in order of decreasing probability.

让我们首先确定我们希望输入存储在内存中,因此我们可以使用string_view s来引用位置/子字符串:

Let's start out by deciding we want our input to be in memory, so we can refer to locations/substrings by using string_views:

#include <boost/utility/string_view.hpp>

using Source = boost::string_view;
using Location = Source::const_iterator;

完成提示

除了AST,我们还希望解析器生成完成提示(隐式完成标记和建议)

Completion Hints

Besides the AST, we want our parser to generate completion hints (the implicit completion tokens and suggestions)

namespace Completion {

    using Candidates = std::vector<std::string>;

    class Hints {
        struct ByLocation {
            template <typename T, typename U>
            bool operator()(T const& a, U const& b) const { return loc(a) < loc(b); }
          private:
            static Location loc(Source const& s) { return s.begin(); }
            static Location loc(Location const& l) { return l; }
        };

      public:
        std::map<Location, std::string, ByLocation> incomplete;
        std::map<Source, Candidates, ByLocation> suggestions;

        /*explicit*/ operator bool() const { return incomplete.size() || suggestions.size(); }
    };

此外,让我们编写一个快速的&肮脏的模糊标识符匹配评分功能.

In addition, let's code up a quick & dirty fuzzy identifier match scoring function.

我选择了一个简单的同步比较

I opted for a simple synchronizing compare that

  • 逐步对相应的字符进行评分,并且
  • 倾向于跳过候选字符而不是跳过输入字符(这意味着adj_diffadjacent_difference的良好模糊匹配,即使已经从候选字符中跳过了字符,但adj_qqq_diff却更糟糕,因为qqq输入的内容无法匹配)
  • 该算法是递归完成的,无需分配
  • 如果匹配,
  • 第一个字符会提升(第一次调用时为rate=1)
  • scores corresponding runs of characters progressively, and
  • favours skipping characters from candidates over skipping characters from the input (meaning that adj_diff is a good fuzzy match for adjacent_difference even though characters have been skipped from the candidate, but adj_qqq_diff is worse because the qqq from the input could not be matched)
  • the algorithm is done recursively and without allocations
  • first characters get a boost if matched (rate=1 at first invocation)
    static int fuzzy_match(Source input, boost::string_view candidate, int rate = 1) { // start with first-letter boost
        int score = 0;

        while (!(input.empty() || candidate.empty())) {
            if (input.front() != candidate.front()) {
                return score + std::max(
                    fuzzy_match(input.substr(1), candidate, std::max(rate-2,0)), //penalty for ignoring an input char
                    fuzzy_match(input, candidate.substr(1), std::max(rate-1,0)));
            }

            input.remove_prefix(1);
            candidate.remove_prefix(1);
            score += ++rate;
        }
        return score;
    }

} // namespace Completion

我们将看看语法中如何使用它.

We'll see how this is used in the grammar.

常规AST,可以执行二进制表达式,字符串/数字文字,变量和(嵌套的)函数调用:

A run-of-the-mill AST that can do binary expressions, string/numeric literals, variables and (nested) function calls:

#include <boost/variant.hpp>

namespace Ast {
    using NumLiteral = double;
    using StringLiteral = std::string; // de-escaped, not source view

    struct Identifier : std::string {
        using std::string::string;
        using std::string::operator=;
    };

    struct BinaryExpression;
    struct CallExpression;

    using Expression = boost::variant<
            NumLiteral,
            StringLiteral,
            Identifier,
            boost::recursive_wrapper<BinaryExpression>,
            boost::recursive_wrapper<CallExpression>
        >;

    struct BinaryExpression {
        Expression lhs;
        char op;
        Expression rhs;
    };

    using ArgList = std::vector<Expression>;

    struct CallExpression {
        Identifier function;
        ArgList args;
    };
}

语法


基础

语法也从基本"开始:

Grammar


Basics

The grammar starts off pretty "basic" as well:

namespace Parsing {
    namespace qi  = boost::spirit::qi;
    namespace phx = boost::phoenix;

    template <typename It>
    struct Expression : qi::grammar<It, Ast::Expression()> {
        Expression(Completion::Hints& hints) : Expression::base_type(start), _hints(hints) {
            using namespace qi;

            start      = skip(space) [expression];

            expression = term   [_val = _1] >> *(char_("-+") >> expression) [_val = make_binary(_val, _1, _2)];
            term       = factor [_val = _1] >> *(char_("*/") >> term)       [_val = make_binary(_val, _1, _2)];
            factor     = simple [_val = _1] >> *(char_("^")  >> factor)     [_val = make_binary(_val, _1, _2)];

            simple     = call | variable | compound | number | string;

到目前为止,

很好:构造函数存储对要记录的Completion::Hints&的引用.所有这些规则都已声明为qi::rule<It, Expression(), qi::space_type>.

So far so good: The constructor stores a reference to the Completion::Hints& to be recorded. All these rules have been declared as qi::rule<It, Expression(), qi::space_type>.

现在变得更加有趣了,这里的一些规则是词素¹

Now it gets slightly more interesting, some rules here are lexemes¹

            number     = double_;
            identifier = raw[(alpha|'_') >> *(alnum|'_')];

有些作品可以容忍缺少结束标记:

And some productions are tolerant of missing closing tokens:

            // imply the closing quotes
            string   %= '"' >> *('\\' >> char_ | ~char_('"')) >> implied('"');
            compound %= '(' >> expression >> implied(')');

implied魔术使之成为可能,因此,如果缺少预期的结束标记,它将被记录为提示,并继续进行解析:

The implied magic makes it so that if the expected closing token is missing, it will be recorded as a hint, and parsing continues:

            auto implied = [=](char ch) {
                return copy(omit[lit(ch) | raw[eps][imply(_1, ch)]]);
            };

当然,这引出了imply(_1, ch)真正作用的问题,我们将在后面看到.

Of course, this begs the question what imply(_1, ch) really does, and we'll see later.

现在,请观察我们执行raw[eps]来获取(空)源iterator_range,以从语义动作中构造一个Location.

For now, observe that we do raw[eps] to get an (empty) source iterator_range to construct a Location from in the semantic action.

标识符查找

我们将符号表"存储在Domain成员_variables_functions中:

        using Domain = qi::symbols<char>;
        Domain _variables, _functions;

然后,我们声明一些可以对它们之一进行查找的规则:

Then we declare some rules that can do lookups on either of them:

        // domain identifier lookups
        qi::_r1_type _domain;
        qi::rule<It, Ast::Identifier(Domain const&)> maybe_known, known, unknown;

相应的声明将很快显示.

The corresponding declarations will be shown shortly.

变量非常简单:

        variable   = maybe_known(phx::ref(_variables));

通话更加棘手.如果名称未知,我们不希望它暗示一个 function ,除非其后跟一个'('字符.但是,如果标识符是已知的函数名,我们甚至要暗示((这使UX具有自动补全的外观,当用户键入sqrt时,它会提示下一个字符神奇地是()

Calls are trickier. If a name is unknown we don't want to assume it implies a function unless it's followed by a '(' character. However, if an identifier is a known function name, we want even to imply the ( (this gives the UX the appearance of autocompletion where when the user types sqrt, it suggests the next character to be ( magically).

        // The heuristics:
        // - an unknown identifier followed by (
        // - an unclosed argument list implies )
        call %= ( known(phx::ref(_functions)) // known -> imply the parens
                     | &(identifier >> '(') >> unknown(phx::ref(_functions))
                     ) >> implied('(') >> -(expression % ',') >> implied(')');

全部基于knownunknownmaybe_known:

            ///////////////////////////////
            // identifier loopkup, suggesting
            {
                maybe_known = known(_domain) | unknown(_domain);

                // distinct to avoid partially-matching identifiers
                using boost::spirit::repository::qi::distinct;
                auto kw     = distinct(copy(alnum | '_'));

                known       = raw[kw[lazy(_domain)]];
                unknown     = raw[identifier[_val=_1]] [suggest_for(_1, _domain)];
            }

调试,预定义的变量/函数

剩下的是一些繁琐的事情:

Debug, Predefined Variables/Functions

Remaining is a bit of red tape:

            BOOST_SPIRIT_DEBUG_NODES(
                (start)
                (expression)(term)(factor)(simple)(compound)
                (call)(variable)
                (identifier)(number)(string)
                //(maybe_known)(known)(unknown) // qi::symbols<> non-streamable
            )

            _variables += "foo", "bar", "qux";
            _functions += "print", "sin", "tan", "sqrt", "frobnicate";
        }

      private:
        Completion::Hints& _hints;

        using Domain = qi::symbols<char>;
        Domain _variables, _functions;

        qi::rule<It, Ast::Expression()> start;
        qi::rule<It, Ast::Expression(), qi::space_type> expression, term, factor, simple;
        // completables
        qi::rule<It, Ast::Expression(), qi::space_type> compound;
        qi::rule<It, Ast::CallExpression(), qi::space_type> call;

        // implicit lexemes
        qi::rule<It, Ast::Identifier()> variable, identifier;
        qi::rule<It, Ast::NumLiteral()> number;
        qi::rule<It, Ast::StringLiteral()> string;

        // domain identifier lookups
        qi::_r1_type _domain;
        qi::rule<It, Ast::Identifier(Domain const&)> maybe_known, known, unknown;

凤凰助手

让我们从通常的帮助器开始构建二进制AST节点:

Phoenix Helpers

Let's start with the usual helper to construct binary AST nodes:

        ///////////////////////////////
        // binary expression factory
        struct make_binary_f {
            Ast::BinaryExpression operator()(Ast::Expression const& lhs, char op, Ast::Expression const& rhs) const {
                return {lhs, op, rhs};
            }
        };
        boost::phoenix::function<make_binary_f> make_binary;

类似地,我们可以有一个Phoenix函数对象来注册隐含字符:

Similarly, we can have a Phoenix function object to register implied chars:

        ///////////////////////////////
        // auto-completing incomplete expressions
        struct imply_f {
            Completion::Hints& _hints;
            void operator()(boost::iterator_range<It> where, char implied_char) const {
                auto inserted = 
                    _hints.incomplete.emplace(&*where.begin(), std::string(1, implied_char));
                // add the implied char to existing completion
                if (!inserted.second)
                    inserted.first->second += implied_char;
            }
        };
        boost::phoenix::function<imply_f> imply { imply_f { _hints } };

请注意,它按位置排序(这使UX更加容易),并检测先前的隐式令牌何时位于同一位置,在这种情况下,我们只需将其附加到该位置即可.

Note that it orders by location (which makes UX easier) and detects when a previous implied token lived at the same location, in which case we simply append to it.

到现在为止,为未知标识符生成相关建议也是凤凰演员就不足为奇了:

By now it won't be much of a surprise that generating relevant suggestions for unknown identifiers is also a phoenix actor:

        ///////////////////////////////
        // suggest_for
        struct suggester {
            Completion::Hints& _hints;

            void operator()(boost::iterator_range<It> where, Domain const& symbols) const {
                using namespace Completion;
                Source id(&*where.begin(), where.size());
                Candidates c;

                symbols.for_each([&](std::string const& k, ...) { c.push_back(k); });

                auto score = [id](Source v) { return fuzzy_match(id, v); };
                auto byscore = [=](Source a, Source b) { return score(a) > score(b); };

                sort(c.begin(), c.end(), byscore);
                c.erase( remove_if(c.begin(), c.end(), [=](Source s) { return score(s) < 3; }), c.end());

                _hints.suggestions.emplace(id, c);
            }
        };
        boost::phoenix::function<suggester> suggest_for {suggester{_hints}};

仅此而已.如果看起来更复杂,那是因为它做得更多:它模糊地对所有应试者评分,按降序对他们排序,并删除所有得分小于<的候选人. 3.

That's all. If it looks more complicated, that's because it does a lot more: it scores all candidates fuzzily, orders them by descending score and removes any candidates with score < 3.

    };
}

奖金

让我们多做一些改动,并允许在参数列表中隐含',':

        call %= ( known(phx::ref(_functions)) // known -> imply the parens
                | &(identifier >> '(') >> unknown(phx::ref(_functions))
                ) 
            >> implied('(') 
            >> -(expression % ',')
            >> implied(')')
            ;

让我们在其中替换',':

            >> -(expression % (',' | !(')'|eoi) >> implied(',')))

注意:检测&expression断言表达式在后,而不是断言未到达输入/参数列表的末尾似乎更有意义.

NOTE: It might seem to make more sense to detect &expression to assert that an expression follows, instead asserting that the end of the input/argument list has not been reached.

这样做很不好,因为任何包含的表达式都可能触发更多的隐式标记,这是一个副作用,导致重复的隐式标记.

Doing that goes bad, though, because any contained expressions could trigger more implied tokens as a side effect, leading to duplicated implied tokens.

这个(副作用有效的语义动作)是我通常避免语义动作或仅使用规则的(暴露的)属性存储效果的主要原因之一.参见 Boost Spirit:语义行为是邪恶的"?

This (side-effectful semantic actions) is one of the chief reasons why I usually avoid semantic actions, or use them to store effect only in the rule's (exposed) attribute(s). See Boost Spirit: "Semantic actions are evil"?

测试驱动程序

如果没有一些不错的测试用例,这篇文章将一无所获.所以去了:

TEST DRIVER

This post would be nothing without some nice test cases. So here goes:

int main() {
    for (Source const input : {
            "",       // invalid
            "(3",     // incomplete, imply ')'
            "3*(6+sqrt(9))^7 - 1e8", // completely valid
            "(3*(((6+sqrt(9))^7 - 1e8", // incomplete, imply ")))"
            "print(\"hello \\\"world!", // completes the string literal and the parameter list
            "foo",    // okay, known variable
            "baz",    // (suggest bar)
            "baz(",   // incomplete, imply ')', unknown function
            "taz(",   // incomplete, imply ')', unknown function
            "san(",   // 2 suggestions (sin/tan)
            "print(1, 2, \"three\", complicated(san(78",
            "(print sqrt sin 9)    -0) \"bye",
        })
    {
        std::cout << "-------------- '" << input << "'\n";
        Location f = input.begin(), l = input.end();

        Ast::Expression expr;
        Completion::Hints hints;
        bool ok = parse(f, l, Parsing::Expression<Location>{hints}, expr);

        if (hints) {
            std::cout << "Input: '" << input << "'\n";
        }
        for (auto& c : hints.incomplete) {
            std::cout << "Missing " << std::setw(c.first - input.begin()) << "" << "^ implied: '" << c.second << "'\n";
        }
        for (auto& id : hints.suggestions) {
            std::cout << "Unknown " << std::setw(id.first.begin() - input.begin()) << "" << std::string(id.first.size(), '^');
            if (id.second.empty()) 
                std::cout << " (no suggestions)\n";
            else {
                std::cout << " (did you mean ";
                once_t first;
                for (auto& s : id.second)
                    std::cout << (first?"":" or ") << "'" << s << "'";
                std::cout << "?)\n";
            }
        }

        if (ok) {
            std::cout << "AST:    " << expr << "\n";
        } else {
            std::cout << "Parse failed\n";
        }

        if (f!=l)
            std::cout << "Remaining input: '" << std::string(f,l) << "'\n";
    }
}

请注意,除了第一个输入("")之外,所有内容都经过启发式解析为 some 这种表达!最后一个旨在显示所有参数列表的含义,因为printsqrtsin是已知函数.然后隐含一些',',最后关闭未封闭的字符串文字和剩余的括号.这是(非调试)输出:

Note that, besides the first input ("") everything is heuristically parsed as some kind of expression! The last one is designed to show off how all the parameter lists are implied because print, sqrt and sin are known functions. Then some ',' are implied, before finally the unclosed string literal and remaining parentheses are closed. Here's the (non-debug) output:

-------------- ''
Parse failed
-------------- '(3'
Input: '(3'
Missing   ^ implied: ')'
AST:    3
-------------- '3*(6+sqrt(9))^7 - 1e8'
AST:    ((3 * ((6 + (sqrt (9))) ^ 7)) - 1e+08)
-------------- '(3*(((6+sqrt(9))^7 - 1e8'
Input: '(3*(((6+sqrt(9))^7 - 1e8'
Missing                         ^ implied: ')))'
AST:    (3 * (((6 + (sqrt (9))) ^ 7) - 1e+08))
-------------- 'print("hello \"world!'
Input: 'print("hello \"world!'
Missing                      ^ implied: '")'
AST:    (print (hello "world!))
-------------- 'foo'
AST:    foo
-------------- 'baz'
Input: 'baz'
Unknown ^^^ (did you mean 'bar'?)
AST:    baz
-------------- 'baz('
Input: 'baz('
Missing     ^ implied: ')'
Unknown ^^^ (no suggestions)
AST:    (baz ())
-------------- 'taz('
Input: 'taz('
Missing     ^ implied: ')'
Unknown ^^^ (did you mean 'tan'?)
AST:    (taz ())
-------------- 'san('
Input: 'san('
Missing     ^ implied: ')'
Unknown ^^^ (did you mean 'sin' or 'tan'?)
AST:    (san ())
-------------- 'print(1, 2, "three", complicated(san(78'
Input: 'print(1, 2, "three", complicated(san(78'
Missing                                        ^ implied: ')))'
Unknown                      ^^^^^^^^^^^ (did you mean 'frobnicate' or 'print'?)
Unknown                                  ^^^ (did you mean 'sin' or 'tan'?)
AST:    (print (1, 2, three, (complicated ((san (78))))))
-------------- '(print sqrt sin 9)    -0) "bye'
Input: '(print sqrt sin 9)    -0) "bye'
Missing        ^ implied: '('
Missing             ^ implied: '('
Missing                 ^ implied: '('
Missing                           ^ implied: ','
Missing                               ^ implied: '"))'
AST:    (print ((sqrt (((sin (9)) - 0))), bye))

完整列表/现场演示

在Coliru上直播

//#define BOOST_SPIRIT_DEBUG
#include <boost/utility/string_view.hpp>

using Source = boost::string_view;
using Location = Source::const_iterator;

#include <map>
#include <vector>

namespace Completion {

    static int fuzzy_match(Source input, boost::string_view candidate, int rate = 1) { // start with first-letter boost
        int score = 0;

        while (!(input.empty() || candidate.empty())) {
            if (input.front() != candidate.front()) {
                return score + std::max(
                    fuzzy_match(input.substr(1), candidate, std::max(rate-2,0)), //penalty for ignoring an input char
                    fuzzy_match(input, candidate.substr(1), std::max(rate-1,0)));
            }

            input.remove_prefix(1);
            candidate.remove_prefix(1);
            score += ++rate;
        }
        return score;
    }

    using Candidates = std::vector<std::string>;

    class Hints {
        struct ByLocation {
            template <typename T, typename U>
            bool operator()(T const& a, U const& b) const { return loc(a) < loc(b); }
          private:
            static Location loc(Source const& s) { return s.begin(); }
            static Location loc(Location const& l) { return l; }
        };

      public:
        std::map<Location, std::string, ByLocation> incomplete;
        std::map<Source, Candidates, ByLocation> suggestions;

        /*explicit*/ operator bool() const { return incomplete.size() || suggestions.size(); }
    };
}

#include <boost/variant.hpp>

namespace Ast {
    using NumLiteral = double;
    using StringLiteral = std::string; // de-escaped, not source view

    struct Identifier : std::string {
        using std::string::string;
        using std::string::operator=;
    };

    struct BinaryExpression;
    struct CallExpression;

    using Expression = boost::variant<
            NumLiteral,
            StringLiteral,
            Identifier,
            boost::recursive_wrapper<BinaryExpression>,
            boost::recursive_wrapper<CallExpression>
        >;

    struct BinaryExpression {
        Expression lhs;
        char op;
        Expression rhs;
    };

    using ArgList = std::vector<Expression>;

    struct CallExpression {
        Identifier function;
        ArgList args;
    };
}

#include <boost/fusion/adapted/struct.hpp>
BOOST_FUSION_ADAPT_STRUCT(Ast::BinaryExpression, lhs, op, rhs)
BOOST_FUSION_ADAPT_STRUCT(Ast::CallExpression, function, args)

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/repository/include/qi_distinct.hpp>

// for debug printing:
namespace {
    struct once_t { // an auto-reset flag
        operator bool() { bool v = flag; flag = false; return v; }
        bool flag = true;
    };
}

// for debug printing:
namespace Ast {

    static inline std::ostream& operator<<(std::ostream& os, std::vector<Expression> const& args) {
        os << "(";
        once_t first;
        for (auto& a : args) os << (first?"":", ") << a;
        return os << ")";
    }

    static inline std::ostream& operator<<(std::ostream& os, BinaryExpression const& e) {
        return os << boost::fusion::as_vector(e);
    }
    static inline std::ostream& operator<<(std::ostream& os, CallExpression const& e) {
        return os << boost::fusion::as_vector(e);
    }
}

namespace Parsing {
    namespace qi  = boost::spirit::qi;
    namespace phx = boost::phoenix;

    template <typename It>
    struct Expression : qi::grammar<It, Ast::Expression()> {
        Expression(Completion::Hints& hints) : Expression::base_type(start), _hints(hints) {
            using namespace qi;

            start      = skip(space) [expression];

            expression = term   [_val = _1] >> *(char_("-+") >> expression) [_val = make_binary(_val, _1, _2)];
            term       = factor [_val = _1] >> *(char_("*/") >> term)       [_val = make_binary(_val, _1, _2)];
            factor     = simple [_val = _1] >> *(char_("^")  >> factor)     [_val = make_binary(_val, _1, _2)];

            simple     = call | variable | compound | number | string;

            auto implied = [=](char ch) {
                return copy(omit[lit(ch) | raw[eps][imply(_1, ch)]]);
            };

            variable   = maybe_known(phx::ref(_variables));

            compound  %= '(' >> expression >> implied(')');

            // The heuristics:
            // - an unknown identifier followed by (
            // - an unclosed argument list implies )
            call %= ( known(phx::ref(_functions)) // known -> imply the parens
                    | &(identifier >> '(') >> unknown(phx::ref(_functions))
                    ) 
                >> implied('(') 
                >> -(expression % (',' | !(')'|eoi) >> implied(',')))
                >> implied(')')
                ;

            // lexemes, primitive rules
            identifier  = raw[(alpha|'_') >> *(alnum|'_')];

            // imply the closing quotes
            string     %= '"' >> *('\\' >> char_ | ~char_('"')) >> implied('"'); // TODO more escapes

            number      = double_; // TODO integral arguments

            ///////////////////////////////
            // identifier loopkup, suggesting
            {
                maybe_known = known(_domain) | unknown(_domain);

                // distinct to avoid partially-matching identifiers
                using boost::spirit::repository::qi::distinct;
                auto kw     = distinct(copy(alnum | '_'));

                known       = raw[kw[lazy(_domain)]];
                unknown     = raw[identifier[_val=_1]] [suggest_for(_1, _domain)];
            }

            BOOST_SPIRIT_DEBUG_NODES(
                (start)
                (expression)(term)(factor)(simple)(compound)
                (call)(variable)
                (identifier)(number)(string)
                //(maybe_known)(known)(unknown) // qi::symbols<> non-streamable
            )

            _variables += "foo", "bar", "qux";
            _functions += "print", "sin", "tan", "sqrt", "frobnicate";
        }

      private:
        Completion::Hints& _hints;

        using Domain = qi::symbols<char>;
        Domain _variables, _functions;

        qi::rule<It, Ast::Expression()> start;
        qi::rule<It, Ast::Expression(), qi::space_type> expression, term, factor, simple;
        // completables
        qi::rule<It, Ast::Expression(), qi::space_type> compound;
        qi::rule<It, Ast::CallExpression(), qi::space_type> call;

        // implicit lexemes
        qi::rule<It, Ast::Identifier()> variable, identifier;
        qi::rule<It, Ast::NumLiteral()> number;
        qi::rule<It, Ast::StringLiteral()> string;

        // domain identifier lookups
        qi::_r1_type _domain;
        qi::rule<It, Ast::Identifier(Domain const&)> maybe_known, known, unknown;

        ///////////////////////////////
        // binary expression factory
        struct make_binary_f {
            Ast::BinaryExpression operator()(Ast::Expression const& lhs, char op, Ast::Expression const& rhs) const {
                return {lhs, op, rhs};
            }
        };
        boost::phoenix::function<make_binary_f> make_binary;

        ///////////////////////////////
        // auto-completing incomplete expressions
        struct imply_f {
            Completion::Hints& _hints;
            void operator()(boost::iterator_range<It> where, char implied_char) const {
                auto inserted = 
                    _hints.incomplete.emplace(&*where.begin(), std::string(1, implied_char));
                // add the implied char to existing completion
                if (!inserted.second)
                    inserted.first->second += implied_char;
            }
        };
        boost::phoenix::function<imply_f> imply { imply_f { _hints } };

        ///////////////////////////////
        // suggest_for
        struct suggester {
            Completion::Hints& _hints;

            void operator()(boost::iterator_range<It> where, Domain const& symbols) const {
                using namespace Completion;
                Source id(&*where.begin(), where.size());
                Candidates c;

                symbols.for_each([&](std::string const& k, ...) { c.push_back(k); });

                auto score = [id](Source v) { return fuzzy_match(id, v); };
                auto byscore = [=](Source a, Source b) { return score(a) > score(b); };

                sort(c.begin(), c.end(), byscore);
                c.erase( remove_if(c.begin(), c.end(), [=](Source s) { return score(s) < 3; }), c.end());

                _hints.suggestions.emplace(id, c);
            }
        };
        boost::phoenix::function<suggester> suggest_for {suggester{_hints}};

    };
}

#include <iostream>
#include <iomanip>

int main() {
    for (Source const input : {
            "",       // invalid
            "(3",     // incomplete, imply ')'
            "3*(6+sqrt(9))^7 - 1e8", // completely valid
            "(3*(((6+sqrt(9))^7 - 1e8", // incomplete, imply ")))"
            "print(\"hello \\\"world!", // completes the string literal and the parameter list
            "foo",    // okay, known variable
            "baz",    // (suggest bar)
            "baz(",   // incomplete, imply ')', unknown function
            "taz(",   // incomplete, imply ')', unknown function
            "san(",   // 2 suggestions (sin/tan)
            "print(1, 2, \"three\", complicated(san(78",
            "(print sqrt sin 9)    -0) \"bye",
        })
    {
        std::cout << "-------------- '" << input << "'\n";
        Location f = input.begin(), l = input.end();

        Ast::Expression expr;
        Completion::Hints hints;
        bool ok = parse(f, l, Parsing::Expression<Location>{hints}, expr);

        if (hints) {
            std::cout << "Input: '" << input << "'\n";
        }
        for (auto& c : hints.incomplete) {
            std::cout << "Missing " << std::setw(c.first - input.begin()) << "" << "^ implied: '" << c.second << "'\n";
        }
        for (auto& id : hints.suggestions) {
            std::cout << "Unknown " << std::setw(id.first.begin() - input.begin()) << "" << std::string(id.first.size(), '^');
            if (id.second.empty()) 
                std::cout << " (no suggestions)\n";
            else {
                std::cout << " (did you mean ";
                once_t first;
                for (auto& s : id.second)
                    std::cout << (first?"":" or ") << "'" << s << "'";
                std::cout << "?)\n";
            }
        }

        if (ok) {
            std::cout << "AST:    " << expr << "\n";
        } else {
            std::cout << "Parse failed\n";
        }

        if (f!=l)
            std::cout << "Remaining input: '" << std::string(f,l) << "'\n";
    }
}


¹ Boost Spirit队长问题

这篇关于如何为用户提供给定boost :: spirit语法的自动完成建议?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆