创建带有转义序列处理功能的带引号的字符串的boost :: spirit :: x3解析器 [英] Creating a boost::spirit::x3 parser for quoted strings with escape sequence handling

查看:49
本文介绍了创建带有转义序列处理功能的带引号的字符串的boost :: spirit :: x3解析器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要为自定义语言为加引号的字符串创建一个解析器,该解析器还将正确处理转义序列,其中包括允许在字符串中使用转义引号.这是我当前的字符串解析器:

I need to create a parser for quoted strings for my custom language that will also properly handle escape sequences, which includes allowing escaped quotes within the string. This is my current string parser:

x3::lexeme[quote > *(x3::char_ - quote) > quote]

其中 quote 只是''的常量表达式.它不会处理任何转义序列.我知道 boost :: spirit :: classic:: lex_escape_ch_p ,但是我不知道如何在 boost :: spirit :: x3 工具(或一般而言)中使用它.解析器必须识别大多数转义序列,例如'\ n''\ t'等常见序列,以及更复杂的东西(例如hex,oct和ansi转义序列.

where quote is just a constant expression for '"'. It does no escape sequence handling whatsoever. I know about boost::spirit::classic::lex_escape_ch_p, but I've no idea how to use that with the boost::spirit::x3 tools (or in general). How could I create a parser that does this? The parser would have to recognize most escape sequences, such as common ones like '\n', '\t', and more complex stuff like hex, oct, and ansi escape sequences.

我很抱歉,如果这篇文章有什么问题,这是我第一次在SO上发表文章.

My apologies if there's something wrong with this post, it's my first time posting on SO.

这是我最终实现解析器的方式:

Here is how I ended up implementing the parser:

x3::lexeme[quote > *(
    ("\\\"" >> &x3::char_) >> x3::attr(quote) | ~x3::char_(quote)
    ) > quote]
[handle_escape_sequences];

其中 handle_escape_sequences 是lambda:

auto handle_escape_sequences = [&](auto&& context) -> void {
    std::string& str = x3::_val(context);

    uint32_t i{};

    static auto replace = [&](const char replacement) -> void {
        str[i++] = replacement;
    };

    if (!classic::parse(std::begin(str), std::end(str), *classic::lex_escape_ch_p[replace]).full)
        throw Error{ "invalid literal" }; // invalid escape sequence most likely

    str.resize(i);
};

它可以进行完整的ANSI转义序列解析,这意味着您可以使用它来进行各种精美的终端操作,例如使用它来设置文本颜色,光标位置等.

It does full ANSI escape sequence parsing, which means you can use it to do all sorts of fancy terminal manipulation like setting the text color, cursor position, etc. with it.

这里是规则的完整定义以及它所依赖的所有内容(我只是从代码中选择了与之相关的所有内容,因此结果看起来像适当的意大利面条),以防有人碰巧需要它:

Here's the full definition of the rule as well as all of the stuff it depends on (I just picked everything related to it out of my code so that's why the result looks like proper spaghetti) in case someone happens to need it:

#include <boost\spirit\home\x3.hpp>
#include <boost\spirit\include\classic_utility.hpp>

using namespace boost::spirit;

#define RULE_DECLARATION(rule_name, attribute_type)                            \
inline namespace Tag { class rule_name ## _tag; }                              \
x3::rule<Tag::rule_name ## _tag, attribute_type, true> rule_name = #rule_name; \

#define SIMPLE_RULE_DEFINITION(rule_name, attribute_type, definition) \
RULE_DECLARATION(rule_name, attribute_type)                           \
auto rule_name ## _def = definition;                                  \
BOOST_SPIRIT_DEFINE(rule_name);

constexpr char quote = '"';


template <class Base, class>
struct Access_base_s : Base {
    using Base::Base, Base::operator=;
};

template <class Base, class Tag>
using Unique_alias_for = Access_base_s<Base, Tag>;


using String_literal = Unique_alias_for<std::string, class String_literal_tag>;

SIMPLE_RULE_DEFINITION(string_literal, String_literal,
    x3::lexeme[quote > *(
        ("\\\"" >> &x3::char_) >> x3::attr(quote) | ~x3::char_(quote)
        ) > quote]
    [handle_escape_sequences];
);

推荐答案

我在此站点上有很多示例

I have many examples of this on this site¹

让我们首先简化您的表达式(〜charset 可能比 charset-exceptions 更有效):

Let met start with simplifying your expression (~charset is likely more efficient than charset - exceptions):

x3::lexeme['"' > *~x3::char_('"')) > '"']

现在,要允许转义,我们可以对它们进行即席解码:

Now, to allow escapes, we can decode them adhoc:

auto qstring = x3::lexeme['"' > *(
         "\\n" >> x3::attr('\n')
       | "\\b" >> x3::attr('\b')
       | "\\f" >> x3::attr('\f')
       | "\\t" >> x3::attr('\t')
       | "\\v" >> x3::attr('\v')
       | "\\0" >> x3::attr('\0')
       | "\\r" >> x3::attr('\r')
       | "\\n" >> x3::attr('\n')
       | "\\"  >> x3::char_("\"\\")
       | ~x3::char_('"')
   ) > '"'];

或者,您可以使用符号方法,包括或不包括斜杠:

Alternatively you could use a symbols approach, either including or excluding the slash:

x3::symbols<char> escapes;
escapes.add
    ( "\\n", '\n')
    ( "\\b", '\b')
    ( "\\f", '\f')
    ( "\\t", '\t')
    ( "\\v", '\v')
    ( "\\0", '\0')
    ( "\\r", '\r')
    ( "\\n", '\n')
    ( "\\\\", '\\')
    ( "\\\"", '"');

auto qstring = x3::lexeme['"' > *(escapes | ~x3::char_('"')) > '"'];

查看 在Coliru上直播 .

See it Live On Coliru as well.

我认为我更喜欢手工分支,因为它们使您可以灵活地进行操作,例如他/八进制转义符(尽管请注意与 \ 0 的冲突):

I think I prefer the hand-rolled branches, because they give you flexibility to do e.g. he/octal escapes (mind the conflict with \0 though):

       | "\\" >> x3::int_parser<char, 8, 1, 3>()
       | "\\x" >> x3::int_parser<char, 16, 2, 2>()

哪个也可以正常工作:

在Coliru上直播

Live On Coliru

#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>

int main() {
    namespace x3 = boost::spirit::x3;

    auto qstring = x3::lexeme['"' > *(
             "\\n" >> x3::attr('\n')
           | "\\b" >> x3::attr('\b')
           | "\\f" >> x3::attr('\f')
           | "\\t" >> x3::attr('\t')
           | "\\v" >> x3::attr('\v')
           | "\\r" >> x3::attr('\r')
           | "\\n" >> x3::attr('\n')
           | "\\"  >> x3::char_("\"\\")
           | "\\" >> x3::int_parser<char, 8, 1, 3>()
           | "\\x" >> x3::int_parser<char, 16, 2, 2>()
           | ~x3::char_('"')
       ) > '"'];

    for (std::string const input : { R"("\ttest\x41\x42\x43 \x031\x032\x033 \"hello\"\r\n")" }) {
        std::string output;
        auto f = begin(input), l = end(input);
        if (x3::phrase_parse(f, l, qstring, x3::blank, output)) {
            std::cout << "[" << output << "]\n";
        } else {
            std::cout << "Failed\n";
        }
        if (f != l) {
            std::cout << "Remaining unparsed: " << std::quoted(std::string(f,l)) << "\n";
        }
    }
}

打印

[   testABC 123 "hello"
]


¹看看这些


¹ Have a look at these

这篇关于创建带有转义序列处理功能的带引号的字符串的boost :: spirit :: x3解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆