有没有人知道一个更快的方法做String.Split()? [英] Does any one know of a faster method to do String.Split()?

查看:206
本文介绍了有没有人知道一个更快的方法做String.Split()?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在读的CSV文件的每一行,并且需要将每个列中的各个值。所以,现在我只是用:

I am reading each line of a CSV file and need to get the individual values in each column. So right now I am just using:

values = line.Split(delimiter);

其中,是保存了由分隔符分隔值的字符串。

where line is the a string that holds the values that are seperated by the delimiter.

测量我的 ReadNextRow 的方法,我注意到它花费66%的 String.Split 的表现,让我不知道是否有人知道一个更快的方法来做到这一点。

Measuring the performance of my ReadNextRow method I noticed that it spends 66% on String.Split, so I was wondering if someone knows of a faster method to do this.

谢谢!

推荐答案

应该指出的是,拆分()是用于分析的情况下CSV文件有问题的方法,你遇到逗号的文件,例如:

It should be pointed out that split() is a questionable approach for parsing CSV files in case you come across commas in the file eg:

1,"Something, with a comma",2,3

其他的事情,我会指出不知道你怎么异形是要小心分析这种低水平的细节。在Windows / PC计时器的粒度可能会发挥作用,你可以在短短如此循环使用某种形式的控制值的显著的开销。

The other thing I'll point out without knowing how you profiled is be careful about profiling this kind of low level detail. The granularity of the Windows/PC timer might come into play and you may have a significant overhead in just looping so use some sort of control value.

话虽这么说,拆分()是用来处理常规EX pressions,这显然要复杂得多,你需要(和错误的工具来处理逃脱逗号反正)。此外,拆分()创建大量的临时对象。

That being said, split() is built to handle regular expressions, which are obviously more complex than you need (and the wrong tool to deal with escaped commas anyway). Also, split() creates lots of temporary objects.

所以,如果你想加快它(我有麻烦相信这部分的性能是一个真正的问题),那么你想要做手工,你想重用缓冲区对象这样你就不会不断地创造对象并给予垃圾收集工作要做,清理起来。

So if you want to speed it up (and I have trouble believing that performance of this part is really an issue) then you want to do it by hand and you want to reuse your buffer objects so you're not constantly creating objects and giving the garbage collector work to do in cleaning them up.

的算法是相对简单:

  • 停止在每一个逗号;
  • 当你点击报价继续下去,直到你打下一组引号;
  • 在手柄转义引号(如\),可以说是逃脱逗号(\)。

呵呵,并给你的正则表达式成本的一些想法,有一个问题(Java的不是C#,但原理是一样的),如果有人想替换字符串每个第n个字符。我建议使用的replaceAll()的字符串。乔恩斯基特手动codeD循环。出于好奇,我比较了两个版本他是一个数量级的更好。

Oh and to give you some idea of the cost of regex, there was a question (Java not C# but the principle was the same) where someone wanted to replace every n-th character with a string. I suggested using replaceAll() on String. Jon Skeet manually coded the loop. Out of curiosity I compared the two versions and his was an order of magnitude better.

所以,如果你真的想要的性能,它的时间来手工解析。

So if you really want performance, it's time to hand parse.

或者,更好的,用别人这样别人的优化解决方案<一href="http://www.$c$cproject.com/KB/database/CsvReader.aspx?fid=142714&df=90&mpp=25&noise=3&sort=Position&view=Quick&fr=126&select=2741699"相对=nofollow>快CSV读者。

Or, better yet, use someone else's optimized solution like this fast CSV reader.

顺便说一句,虽然这是涉及到Java它涉及定期EX pressions表现一般(这是普遍的)和的replaceAll() VS一个手工codeD环:<一href="http://stackoverflow.com/questions/537174/putting-char-into-a-java-string-for-each-n-characters">Putting焦炭引入Java字符串每个N个字符。

By the way, while this is in relation to Java it concerns the performance of regular expressions in general (which is universal) and replaceAll() vs a hand-coded loop: Putting char into a java string for each N characters.

这篇关于有没有人知道一个更快的方法做String.Split()?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆