拆分使用分隔符,除了分隔符转义 [英] Split using delimiter except when delimiter is escaped

查看:260
本文介绍了拆分使用分隔符,除了分隔符转义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用

var stream =(System.IO.Stream)(Forms.Clipboard.GetDataObject)读取来自excel的剪贴板数据()).GetData(Forms.DataFormats.CommaSeparatedValue);

但不幸的是,excel正在传递单元格文本而不是单元格值。当单元格使用特殊格式化(例如数千个分隔符)时,列中的一系列单元格的剪贴板数据如下所示:

but unfortunately, excel is passing cell text instead of cell values. When the cells are using special formatting (such as the thousands seperator), the clipboard data for a series of cells in columns that looks like this:

 1,234,123.00    2,345.00    342.00      12,345.00

存储为:

\" 1,234,123.00 \",\" 2,345.00 \", 342.00 ,\" 12,345.00 \"

当我真正想要的是:

 1234123.00, 2345.00, 342.00, 12345.00

以前我曾经使用过 clipData.Split(new string [] {,},StringSllitOptions.None))函数将我的CSV剪贴板数据转换为一系列单元格,但是当转义的格式化文本包含逗号时,会失败

I had been previously using the clipData.Split(new string[] { "," }, StringSllitOptions.None)) function to turn my CSV clipboard data into a series of cells, but this fails when there is escaped formatted text containing commas.

我问是否有人能想到将这个字符串拆分成一组单元格的方法,忽略自从t开始,逗号从 \位中转义他是Excel如何选择转义包含逗号的单元格。

I'm asking if anyone can think of a way to split this string into a set of cells, ignoring the commas escaped within the \" bits, since this is how Excel is choosing to escape cells containing commas.

简而言之,我如何转一个包含以下内容的单个字符串:

In short, how can I turn a single string containing this:

\" 1,234,123.00 \",\" 2,345.00 \", 342.00 ,\" 12,345.00 \"

转换成包含以下内容的字符串数组:

into an array of strings containing this:

{ "1,234,123.00", "2,345.00", "342.00", "12,345.00" }

不破坏我的能力解析一个简单的逗号分隔的字符串。

Without ruining my ability to parse a simple comma delimited string.

*****编辑***

*****edit***

跟进问题(制定为DFA):每次确定性有限自动机达到最终状态时分割字符串?

Follow up question (formulated as a DFA) here: Split a string based on each time a Deterministic Finite Automata reaches a final state?

推荐答案

首先关闭我已经从Excel处理过数据,通常看到的是逗号分隔的值如果该值被认为是一个字符串,它将会有双引号(并且可以包含逗号和双引号)。如果它被认为是数字的,那么没有双引号。另外,如果数据包含一个双引号,它将被一个双引号定义,如。所以假设所有这些都是我过去的处理方式。

First off I've dealt with data from Excel before and what you typically see is comma separated values and if the value is considered to be a string it will have double quotes around it (and can contain commas and double quotes). If it is considered to be numeric then there are not double quotes. Additionally if the data contains a double quote that will be delimited by a double quote like "". So assuming all of that here's how I've dealt with this in the past

public static IEnumerable<string> SplitExcelRow(this string value)
{
    value = value.Replace("\"\"", "&quot;");
    bool quoted = false;
    int currStartIndex = 0;
    for (int i = 0; i < value.Length; i++)
    {
        char currChar = value[i];
        if (currChar == '"')
        {
            quoted = !quoted;       
        }
        else if (currChar == ',')
        {
            if (!quoted)
            {
                yield return value.Substring(currStartIndex, i - currStartIndex)
                    .Trim()
                    .Replace("\"","")
                    .Replace("&quot;","\"");
                currStartIndex = i + 1;
            }
        }
    }
    yield return value.Substring(currStartIndex, value.Length - currStartIndex)
        .Trim()
        .Replace("\"", "")
        .Replace("&quot;", "\"");
}

这假设数据进来是有效的,所以如果你有一些类似fo,ob,ar,barfoo这将不起作用。另外,如果您的数据包含& quot; ,那么它将变成这可能是也可能不是可取的。

Of course this assumes the data coming in is valid so if you have something like "fo,o"b,ar","bar""foo" this will not work. Additionally if your data contains &quot; then it will be turned into a " which may or may not be desirable.

这篇关于拆分使用分隔符,除了分隔符转义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆