什么是任意分隔符/转义字符处理最好的算法? [英] What is the best algorithm for arbitrary delimiter/escape character processing?
问题描述
我有点惊讶,没有这个在网络上的一些信息,我不断发现问题比我想象的有点棘手。
下面的规则:
- 您开始使用分隔/转义的数据分割成一个数组。
- 分隔符是一个任意字符
- 转义字符是一个任意字符
- 这两个分隔符和可能发生在数据中的转义字符
- 在正则表达式是好的,但良好性能的解决方案是最好的
- 编辑:空元素(包括领导或结束分隔符)可以忽略不计
在code签名(在C#中就可以了,基本上)
公共静态字符串[] smartSplit(
串delimitedData,
字符分隔符,
字符转义){}
这个问题的最困难的部分是逃脱连续转义字符的情况下,当然是因为(呼叫/转义字符和分隔符)://////// = ////
我缺少什么地方,这是在网络上进行处理,或在另一SO问题吗?如果不是,把你的大动脑筋的工作......我觉得这个问题是什么,将是不错的有SO用于公益事业。我做这个工作,我自己,但没有一个很好的解决方案呢。
无效smartSplit(字符串常量和放大器,文字,字符DELIM,字符ESC,矢量<串>&安培;令牌)
{
枚举国家{师范大学,IN_ESC};
状态的状态=正常;
串FRAG;
用于(为size_t I = 0; I< text.length(); ++ I)
{
字符C =文[I]
开关(州)
{
区分作者:
如果(C == DELIM)
{
如果(!frag.empty())
tokens.push_back(FRAG);
frag.clear();
}
否则,如果(C == ESC)
状态= IN_ESC;
其他
frag.append(1,C);
打破;
案例IN_ESC:
frag.append(1,C);
状态=正常;
打破;
}
}
如果(!frag.empty())
tokens.push_back(FRAG);
}
I'm a little surprised that there isn't some information on this on the web, and I keep finding that the problem is a little stickier than I thought.
Here's the rules:
- You are starting with delimited/escaped data to split into an array.
- The delimiter is one arbitrary character
- The escape character is one arbitrary character
- Both the delimiter and the escape character could occur in data
- Regex is fine, but a good-performance solution is best
- Edit: Empty elements (including leading or ending delimiters) can be ignored
The code signature (in C# would be, basically)
public static string[] smartSplit(
string delimitedData,
char delimiter,
char escape) {}
The stickiest part of the problem is the escaped consecutive escape character case, of course, since (calling / the escape character and , the delimiter): ////////, = ////,
Am I missing somewhere this is handled on the web or in another SO question? If not, put your big brains to work... I think this problem is something that would be nice to have on SO for the public good. I'm working on it myself, but don't have a good solution yet.
void smartSplit(string const& text, char delim, char esc, vector<string>& tokens)
{
enum State { NORMAL, IN_ESC };
State state = NORMAL;
string frag;
for (size_t i = 0; i<text.length(); ++i)
{
char c = text[i];
switch (state)
{
case NORMAL:
if (c == delim)
{
if (!frag.empty())
tokens.push_back(frag);
frag.clear();
}
else if (c == esc)
state = IN_ESC;
else
frag.append(1, c);
break;
case IN_ESC:
frag.append(1, c);
state = NORMAL;
break;
}
}
if (!frag.empty())
tokens.push_back(frag);
}
这篇关于什么是任意分隔符/转义字符处理最好的算法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!