在C#中的字符串分割 [英] Split String in C#
问题描述
我想这将是微不足道的,但我不能得到这个工作。
I thought this will be trivial but I can't get this to work.
假设在一个CSV文件中的行:
巴拉克·奥巴马,48,总统,总统一家大道,华盛顿DC
Assume a line in a CSV file:
"Barack Obama", 48, "President", "1600 Penn Ave, Washington DC"
的String []标记= line.split('')
我期待这样的:
"Barack Obama"
48
"President"
"1600 Penn Ave, Washington DC"
但最后令牌是
'华盛顿'
不是
总统一家大道,华盛顿DC
。
有没有一种简单的方法来获得分割函数忽略引号内的逗号?
Is there an easy way to get the split function to ignore the comma within quotes?
我有超过CSV文件没有控制权,它不;吨被发送给我。客户A将使用该应用读取由外部单独提供的文件。
I have no control over the CSV file and it doesn;t get sent to me. Customer A will be using the app to read files provided by an external individual.
推荐答案
您可能需要编写自己的分裂功能
You might have to write your own split function.
- 通过字符串中的每个字符迭代
- 当你点击一个
字符,切换一个布尔
- 当你打一个逗号,如果布尔是真实的,忽略它,否则,你有你的道理
- Iterate through each char in the string
- When you hit a
"
character, toggle a boolean - When you hit a comma, if the bool is true, ignore it, else, you have your token
下面是一个例子:
public static class StringExtensions
{
public static string[] SplitQuoted(this string input, char separator, char quotechar)
{
List<string> tokens = new List<string>();
StringBuilder sb = new StringBuilder();
bool escaped = false;
foreach (char c in input)
{
if (c.Equals(separator) && !escaped)
{
// we have a token
tokens.Add(sb.ToString().Trim());
sb.Clear();
}
else if (c.Equals(separator) && escaped)
{
// ignore but add to string
sb.Append(c);
}
else if (c.Equals(quotechar))
{
escaped = !escaped;
sb.Append(c);
}
else
{
sb.Append(c);
}
}
tokens.Add(sb.ToString().Trim());
return tokens.ToArray();
}
}
然后,只需拨打:
Then just call:
string[] tokens = line.SplitQuoted(',','\"');
<基准我的代码和丹H1>基准
结果?陶代码低于我很高兴基准任何其他的解决方案,如果人们希望他们
Benchmarks
Results of benchmarking my code and Dan Tao's code are below. I'm happy to benchmark any other solutions if people want them?
代码:
string input = "\"Barak Obama\", 48, \"President\", \"1600 Penn Ave, Washington DC\""; // Console.ReadLine()
string[] tokens = null;
// run tests
DateTime start = DateTime.Now;
for (int i = 0; i < 1000000; i++)
tokens = input.SplitWithQualifier(',', '\"', false);
Console.WriteLine("1,000,000 x SplitWithQualifier = {0}ms", DateTime.Now.Subtract(start).TotalMilliseconds);
start = DateTime.Now;
for (int i = 0; i<1000000;i++)
tokens = input.SplitQuoted(',', '\"');
Console.WriteLine("1,000,000 x SplitQuoted = {0}ms", DateTime.Now.Subtract(start).TotalMilliseconds);
输出:
1,000,000 x SplitWithQualifier = 8156.25ms
1,000,000 x SplitQuoted = 2406.25ms
这篇关于在C#中的字符串分割的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!