在C#中的字符串分割 [英] Split String in C#

查看:175
本文介绍了在C#中的字符串分割的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想这将是微不足道的,但我不能得到这个工作。

I thought this will be trivial but I can't get this to work.

假设在一个CSV文件中的行:
巴拉克·奥巴马,48,总统,总统一家大道,华盛顿DC

Assume a line in a CSV file: "Barack Obama", 48, "President", "1600 Penn Ave, Washington DC"

的String []标记= line.split('')

我期待这样的:

 "Barack Obama"
 48
 "President"
 "1600 Penn Ave, Washington DC"

但最后令牌是
'华盛顿'不是
总统一家大道,华盛顿DC

有没有一种简单的方法来获得分割函数忽略引号内的逗号?

Is there an easy way to get the split function to ignore the comma within quotes?

我有超过CSV文件没有控制权,它不;吨被发送给我。客户A将使用该应用读取由外部单独提供的文件。

I have no control over the CSV file and it doesn;t get sent to me. Customer A will be using the app to read files provided by an external individual.

推荐答案

您可能需要编写自己的分裂功能

You might have to write your own split function.


  • 通过字符串中的每个字符迭代

  • 当你点击一个字符,切换一个布尔

  • 当你打一个逗号,如果布尔是真实的,忽略它,否则,你有你的道理

  • Iterate through each char in the string
  • When you hit a " character, toggle a boolean
  • When you hit a comma, if the bool is true, ignore it, else, you have your token

下面是一个例子:

public static class StringExtensions
{
    public static string[] SplitQuoted(this string input, char separator, char quotechar)
    {
        List<string> tokens = new List<string>();

        StringBuilder sb = new StringBuilder();
        bool escaped = false;
        foreach (char c in input)
        {
            if (c.Equals(separator) && !escaped)
            {
                // we have a token
                tokens.Add(sb.ToString().Trim());
                sb.Clear();
            }
            else if (c.Equals(separator) && escaped)
            {
                // ignore but add to string
                sb.Append(c);
            }
            else if (c.Equals(quotechar))
            {
                escaped = !escaped;
                sb.Append(c);
            }
            else
            {
                sb.Append(c);
            }
        }
        tokens.Add(sb.ToString().Trim());

        return tokens.ToArray();
    }
}



然后,只需拨打:

Then just call:

string[] tokens = line.SplitQuoted(',','\"');






<基准我的代码和丹H1>基准

结果?陶代码低于我很高兴基准任何其他的解决方案,如果人们希望他们


Benchmarks

Results of benchmarking my code and Dan Tao's code are below. I'm happy to benchmark any other solutions if people want them?

代码:

string input = "\"Barak Obama\", 48, \"President\", \"1600 Penn Ave, Washington DC\""; // Console.ReadLine()
string[] tokens = null;

// run tests
DateTime start = DateTime.Now;
for (int i = 0; i < 1000000; i++)
    tokens = input.SplitWithQualifier(',', '\"', false);
Console.WriteLine("1,000,000 x SplitWithQualifier = {0}ms", DateTime.Now.Subtract(start).TotalMilliseconds);

start = DateTime.Now;
for (int i = 0; i<1000000;i++)
    tokens = input.SplitQuoted(',', '\"');
Console.WriteLine("1,000,000 x SplitQuoted =        {0}ms", DateTime.Now.Subtract(start).TotalMilliseconds);

输出:

1,000,000 x SplitWithQualifier = 8156.25ms
1,000,000 x SplitQuoted =        2406.25ms

这篇关于在C#中的字符串分割的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆