如何用转义的空格分隔句子? [英] How to split a sentence with an escaped whitespace?

查看:105
本文介绍了如何用转义的空格分隔句子?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

除了转义的空格,我想用空格作为分隔符来分隔句子.使用boost :: split和regex,如何分割它?如果不可能的话,还能怎么做?

I want to split my sentence using whitespace as my delimiter except for escaped whitespaces. Using boost::split and regex, how can I split it? If not possible, how else?

示例:

std::string sentence = "My dog Fluffy\\ Cake likes to jump";

结果:
我的

蓬松的蛋糕
喜欢


Result:
My
dog
Fluffy\ Cake
likes
to
jump

推荐答案

三个实现:

  1. 借助Boost Spirit
  2. 使用Boost Regex
  3. 手写解析器

借助Boost Spirit

这就是我如何使用Boost Spirit进行的操作.这看似有些矫kill过正,但是经验告诉我,一旦分割输入文本,您可能将需要更多的解析逻辑.

With Boost Spirit

Here's how I'd do this with Boost Spirit. This might seem overkill, but experience teaches me that once you're splitting input text you will likely require more parsing logic.

当您从仅拆分标记"到具有生产规则的真实语法时,Boost Spirit就会闪耀.

Boost Spirit shines when you scale from "just splitting tokens" to a real grammar with production rules.

在Coliru上直播

#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;

int main() {
    std::string const sentence = "My dog Fluffy\\ Cake likes to jump";
    using It = std::string::const_iterator;
    It f = sentence.begin(), l = sentence.end();

    std::vector<std::string> words;

    bool ok = qi::phrase_parse(f, l,
            *qi::lexeme [ +('\\' >> qi::char_ | qi::graph) ], // words
            qi::space - "\\ ", // skipper
            words);

    if (ok) {
        std::cout << "Parsed:\n";
        for (auto& w : words)
            std::cout << "\t'" << w << "'\n";
    } else {
        std::cout << "Parse failed\n";
    }

    if (f != l)
        std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}

使用Boost Regex

这看起来确实很简洁,但是

With Boost Regex

This looks really succinct but

在Coliru上直播

#include <iostream>
#include <boost/regex.hpp>
#include <boost/algorithm/string_regex.hpp>
#include <vector>

int main() {
    std::string const sentence = "My dog Fluffy\\ Cake likes to jump";

    std::vector<std::string> words;
    boost::algorithm::split_regex(words, sentence, boost::regex("(?<!\\\\)\\s"), boost::match_default);

    for (auto& w : words)
        std::cout << " '" << w << "'\n";
}

使用c ++ 11原始文字,您可以不太明显地编写正则表达式:boost::regex(R"((?<!\\)\s)"),表示没有反斜杠的任何空格"

Using c++11 raw literals you could write the regular expression slightly less obscurely: boost::regex(R"((?<!\\)\s)"), meaning "any whitespace not following a backslash"

手写解析器

这有点乏味,但就像Spirit语法是完全通用的一样,并且可以提供出色的性能.

Handwritten parser

This is somewhat more tedious, but like the Spirit grammar is completely generic, and allow nice performance.

但是,一旦您开始增加语法的复杂性,它就不会像Spirit方法那样优雅地扩展.优点是,与Spirit版本相比,您花费在编译代码上的时间更少.

However, it doesn't nearly scale as gracefully as the Spirit approach once you start adding complexity to your grammar. An advantage is that you spend less time compiling the code than with the Spirit version.

在Coliru上直播

#include <iostream>
#include <iterator>
#include <vector>

template <typename It, typename Out>
Out tokens(It f, It l, Out out) {
    std::string accum;
    auto flush = [&] { 
        if (!accum.empty()) {
            *out++ = accum;
            accum.resize(0);
        }
    };

    while (f!=l) {
        switch(*f) {
            case '\\': 
                if (++f!=l && *f==' ')
                    accum += ' ';
                else
                    accum += '\\';
                break;
            case ' ': case '\t': case '\r': case '\n':
                ++f;
                flush();
                break;
            default:
                accum += *f++;
        }
    }
    flush();
    return out;
}

int main() {
    std::string const sentence = "My dog Fluffy\\ Cake likes to jump";

    std::vector<std::string> words;

    tokens(sentence.begin(), sentence.end(), back_inserter(words));

    for (auto& w : words)
        std::cout << "\t'" << w << "'\n";
}

这篇关于如何用转义的空格分隔句子?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆