如何使我的工作拆分只在一个实线并有能力跳过字符串的引用部分? [英] How to make my split work only on one real line and be capable to skip quoted parts of string?

查看:126
本文介绍了如何使我的工作拆分只在一个实线并有能力跳过字符串的引用部分?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此​​,我们有一个简单拆分

So we have a simple split:

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace std;

vector<string> split(const string& s, const string& delim, const bool keep_empty = true) {
    vector<string> result;
    if (delim.empty()) {
        result.push_back(s);
        return result;
    }
    string::const_iterator substart = s.begin(), subend;
    while (true) {
        subend = search(substart, s.end(), delim.begin(), delim.end());
        string temp(substart, subend);
        if (keep_empty || !temp.empty()) {
            result.push_back(temp);
        }
        if (subend == s.end()) {
            break;
        }
        substart = subend + delim.size();
    }
    return result;
}

提振分裂。我们有简单的主这样的:

or boost split. And we have simple main like:

int main() {
    const vector<string> words = split("close no \"\n matter\" how \n far", " ");
    copy(words.begin(), words.end(), ostream_iterator<string>(cout, "\n"));
}

如何使它oputput像

how to make it oputput something like

close 
no
"\n matter"
how
end symbol found.

我们要介绍的分裂的,依法追究unsplited并应结束解析过程charecters结构。如何做这样的事情?

we want to introduce to split structures that shall be held unsplited and charecters that shall end parsing process. how to do such thing?

推荐答案

以下code:

vector<string>::const_iterator matchSymbol(const string & s, string::const_iterator i, const vector<string> & symbols)
{
    vector<string>::const_iterator testSymbol;
    for (testSymbol=symbols.begin();testSymbol!=symbols.end();++testSymbol) {
        if (!testSymbol->empty()) {
            if (0==testSymbol->compare(0,testSymbol->size(),&(*i),testSymbol->size())) {
                return testSymbol;
            }
        }
    }

    assert(testSymbol==symbols.end());
    return testSymbol;
}

vector<string> split(const string& s, const vector<string> & delims, const vector<string> & terms, const bool keep_empty = true)
{
    vector<string> result;
    if (delims.empty()) {
        result.push_back(s);
        return result;
    }

    bool checkForDelim=true;

    string temp;
    string::const_iterator i=s.begin();
    while (i!=s.end()) {
        vector<string>::const_iterator testTerm=terms.end();
        vector<string>::const_iterator testDelim=delims.end();

        if (checkForDelim) {
            testTerm=matchSymbol(s,i,terms);
            testDelim=matchSymbol(s,i,delims);
        }

        if (testTerm!=terms.end()) {
            i=s.end();
        } else if (testDelim!=delims.end()) {
            if (!temp.empty() || keep_empty) {
                result.push_back(temp);
                temp.clear();
            }
            string::const_iterator j=testDelim->begin();
            while (i!=s.end() && j!=testDelim->end()) {
                ++i;
                ++j;
            }
        } else if ('"'==*i) {
            if (checkForDelim) {
                string::const_iterator j=i;
                do {
                    ++j;
                } while (j!=s.end() && '"'!=*j);
                checkForDelim=(j==s.end());
                if (!checkForDelim && !temp.empty() || keep_empty) {
                    result.push_back(temp);
                    temp.clear();
                }
                temp.push_back('"');
                ++i;
            } else {
                //matched end quote
                checkForDelim=true;
                temp.push_back('"');
                ++i;
                result.push_back(temp);
                temp.clear();
            }
        } else if ('\n'==*i) {
            temp+="\\n";
            ++i;
        } else {
            temp.push_back(*i);
            ++i;
        }
    }

    if (!temp.empty() || keep_empty) {
        result.push_back(temp);
    }
    return result;
}

int runTest()
{
    vector<string> delims;
    delims.push_back(" ");
    delims.push_back("\t");
    delims.push_back("\n");
    delims.push_back("split_here");

    vector<string> terms;
    terms.push_back(">");
    terms.push_back("end_here");

    const vector<string> words = split("close no \"\n end_here matter\" how \n far testsplit_heretest\"another split_here test\"with some\"mo>re", delims, terms, false);

    copy(words.begin(), words.end(), ostream_iterator<string>(cout, "\n"));
}

生成:

close
no
"\n end_here matter"
how
far
test
test
"another split_here test"
with
some"mo

根据您所提供的例子

,你似乎想换行,当他们出现报价以外的数作为分隔符,并通过文字psented重新$ P $ \\ n 当行情里面,所以这是该做什么。它还增加了有多个分隔符,如 split_here 为我所测试的能力。

Based on the examples you gave, you seemed to want newlines to count as delimiters when they appear outside of quotes and be represented by the literal \n when inside of quotes, so that's what this does. It also adds the ability to have multiple delimiters, such as split_here as I used the test.

我不知道,如果你想不匹配的引号中的方式相匹配的报价做分割,因为你给的例子具有无与伦比的报价用空格隔开。这code把无与伦比的报价为任何其他字符,但它应该很容易修改,如果这不是你想要的行为。

I wasn't sure if you want unmatched quotes to be split the way matched quotes do since the example you gave has the unmatched quote separated by spaces. This code treats unmatched quotes as any other character, but it should be easy to modify if this is not the behavior you want.

行:

if (0==testSymbol->compare(0,testSymbol->size(),&(*i),testSymbol->size())) {

将工作在大多数,如果不是全部,对STL的实现,但它不gauranteed工作。它可以用更安全,但速度慢,版本替换:

will work on most, if not all, implementations of the STL, but it is not gauranteed to work. It can be replaced with the safer, but slower, version:

if (*testSymbol==s.substr(i-s.begin(),testSymbol->size())) {

这篇关于如何使我的工作拆分只在一个实线并有能力跳过字符串的引用部分?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆