有没有人知道一个更快的方法String.Split()? [英] Does any one know of a faster method to do String.Split()?

查看:171
本文介绍了有没有人知道一个更快的方法String.Split()?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在读取CSV文件的每一行,需要获取每列中的各个值。所以现在我只是使用:

I am reading each line of a CSV file and need to get the individual values in each column. So right now I am just using:

values = line.Split(delimiter);

其中

测量我的 ReadNextRow 方法的性能我注意到它花费了66 % String.Split ,所以我想知道是否有人知道更快的方法来做到这一点。

Measuring the performance of my ReadNextRow method I noticed that it spends 66% on String.Split, so I was wondering if someone knows of a faster method to do this.

谢谢

推荐答案

应该指出, split()有问题的方法来解析CSV文件,以防万一你在文件中遇到逗号,例如:

It should be pointed out that split() is a questionable approach for parsing CSV files in case you come across commas in the file eg:

1,"Something, with a comma",2,3

我会指出的另一件事情,剖析这种低级细节。 Windows / PC计时器的粒度可能会发挥作用,你可能有一个重要的开销,只是循环,因此使用某种控制值。

The other thing I'll point out without knowing how you profiled is be careful about profiling this kind of low level detail. The granularity of the Windows/PC timer might come into play and you may have a significant overhead in just looping so use some sort of control value.

code> split()用于处理正则表达式,这显然比你需要的更复杂(以及使用错误的工具来处理转义的逗号)。此外, split()创建了很多临时对象。

That being said, split() is built to handle regular expressions, which are obviously more complex than you need (and the wrong tool to deal with escaped commas anyway). Also, split() creates lots of temporary objects.

所以如果你想加速有麻烦相信这部分的性能真的是一个问题),那么你想手工做,你想重用你的缓冲区对象,所以你不是不断创建对象和给垃圾收集器工作做清洁它们。

So if you want to speed it up (and I have trouble believing that performance of this part is really an issue) then you want to do it by hand and you want to reuse your buffer objects so you're not constantly creating objects and giving the garbage collector work to do in cleaning them up.

其算法相对简单:



  • 当您点击引号时,直到您点击下一组引号;

  • 处理转义的引号(即\)和可疑的转义逗号\,)。

哦,为了让你了解regex的代价,还有一个问题(Java不是C#原则是一样的)有人想用一个字符串替换每个第n个字符我建议使用 replaceAll()在字符串Jon Skeet手动编码的循环。

Oh and to give you some idea of the cost of regex, there was a question (Java not C# but the principle was the same) where someone wanted to replace every n-th character with a string. I suggested using replaceAll() on String. Jon Skeet manually coded the loop. Out of curiosity I compared the two versions and his was an order of magnitude better.

所以,如果你真的想要表演,那么现在是时间来解析了。

So if you really want performance, it's time to hand parse.

或者,更好地使用他人的优化解决方案,例如 fast CSV reader

Or, better yet, use someone else's optimized solution like this fast CSV reader.

顺便说一句,虽然这与Java相关,但它涉及正则表达式的性能(通用)和 replaceAll() vs手编码循环:将char放入每个的java字符串中N个字符

By the way, while this is in relation to Java it concerns the performance of regular expressions in general (which is universal) and replaceAll() vs a hand-coded loop: Putting char into a java string for each N characters.

这篇关于有没有人知道一个更快的方法String.Split()?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆