什么是任意分隔符/转义字符处理最好的算法? [英] What is the best algorithm for arbitrary delimiter/escape character processing?

查看:261
本文介绍了什么是任意分隔符/转义字符处理最好的算法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有点惊讶,没有这个在网络上的一些信息,我不断发现问题比我想象的有点棘手。

下面的规则:

  1. 您开始使用分隔/转义的数据分割成一个数组。
  2. 分隔符是一个任意字符
  3. 转义字符是一个任意字符
  4. 这两个分隔符和可能发生在数据中的转义字符
  5. 在正则表达式是好的,但良好性能的解决方案是最好的
  6. 编辑:空元素(包括领导或结束分隔符)可以忽略不计

在code签名(在C#中就可以了,基本上)

 公共静态字符串[] smartSplit(
                         串delimitedData,
                         字符分隔符,
                         字符转义){}
 

这个问题的最困难的部分是逃脱连续转义字符的情况下,当然是因为(呼叫/转义字符和分隔符)://////// = ////

我缺少什么地方,这是在网络上进行处理,或在另一SO问题吗?如果不是,把你的大动脑筋的工作......我觉得这个问题是什么,将是不错的有SO用于公益事业。我做这个工作,我自己,但没有一个很好的解决方案呢。

解决方案

 无效smartSplit(字符串常量和放大器,文字,字符DELIM,字符ESC,矢量<串>&安培;令牌)
{
    枚举国家{师范大学,IN_ESC};
    状态的状态=正常;
    串FRAG;

    用于(为size_t I = 0; I< text.length(); ++ I)
    {
    字符C =文[I]
    开关(州)
    {
    区分作者:
    如果(C == DELIM)
    {
    如果(!frag.empty())
    tokens.push_back(FRAG);
    frag.clear();
    }
    否则,如果(C == ESC)
    状态= IN_ESC;
    		其他
    frag.append(1,C);
    		打破;
    案例IN_ESC:
    frag.append(1,C);
    状态=正常;
    		打破;
    }
    }
    如果(!frag.empty())
    tokens.push_back(FRAG);
}
 

I'm a little surprised that there isn't some information on this on the web, and I keep finding that the problem is a little stickier than I thought.

Here's the rules:

  1. You are starting with delimited/escaped data to split into an array.
  2. The delimiter is one arbitrary character
  3. The escape character is one arbitrary character
  4. Both the delimiter and the escape character could occur in data
  5. Regex is fine, but a good-performance solution is best
  6. Edit: Empty elements (including leading or ending delimiters) can be ignored

The code signature (in C# would be, basically)

public static string[] smartSplit(
                         string delimitedData, 
                         char delimiter, 
                         char escape) {}

The stickiest part of the problem is the escaped consecutive escape character case, of course, since (calling / the escape character and , the delimiter): ////////, = ////,

Am I missing somewhere this is handled on the web or in another SO question? If not, put your big brains to work... I think this problem is something that would be nice to have on SO for the public good. I'm working on it myself, but don't have a good solution yet.

解决方案

void smartSplit(string const& text, char delim, char esc, vector<string>& tokens)
{
    enum State { NORMAL, IN_ESC };
    State state = NORMAL;
    string frag;

    for (size_t i = 0; i<text.length(); ++i)
    {
    	char c = text[i];
    	switch (state)
    	{
    	case NORMAL:
    		if (c == delim)
    		{
    			if (!frag.empty())
    				tokens.push_back(frag);
    			frag.clear();
    		}
    		else if (c == esc)
    			state = IN_ESC;
    		else
    			frag.append(1, c);
    		break;
    	case IN_ESC:
    		frag.append(1, c);
    		state = NORMAL;
    		break;
    	}
    }
    if (!frag.empty())
    	tokens.push_back(frag);
}

这篇关于什么是任意分隔符/转义字符处理最好的算法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆