快速CSV解析 [英] Fast CSV parsing
问题描述
我有一个下载CSV文件并解析它的java服务器应用程序。解析可能需要5到45分钟,并且每小时发生。这种方法是应用程序的瓶颈,因此它不是过早优化。代码到目前为止:
client.executeMethod(method);
InputStream in = method.getResponseBodyAsStream(); // this is http stream
String line;
String [] record;
reader = new BufferedReader(new InputStreamReader(in),65536);
try {
//读标题行
line = reader.readLine();
//一些代码
while((line = reader.readLine())!= null){
//更多代码
line = line.replaceAll \\, \空值\);
//现在删除所有的引号
line = line.replaceAll(\,);
if !line.startsWith(ERROR){
// bla bla
continue;
}
record = line.split(,);
//更多错误处理
//构建对象并将其放在HashMap中
}
//异常处理,关闭连接和读取器
是否有任何现有的库可以帮助我加快进度?我可以改进现有的代码吗?
解决方案
Apache Commons CSV
您看过 Apache Commons CSV ?
注意使用
c>
另一个要记住的是,
split
只返回数据视图,原始行
对象不符合垃圾回收的条件,同时存在对其任何视图的引用。也许做防守副本会有帮助吗? ( Java错误报告)I have a java server app that download CSV file and parse it. The parsing can take from 5 to 45 minutes, and happens each hour.This method is a bottleneck of the app so it's not premature optimization. The code so far:
client.executeMethod(method); InputStream in = method.getResponseBodyAsStream(); // this is http stream String line; String[] record; reader = new BufferedReader(new InputStreamReader(in), 65536); try { // read the header line line = reader.readLine(); // some code while ((line = reader.readLine()) != null) { // more code line = line.replaceAll("\"\"", "\"NULL\""); // Now remove all of the quotes line = line.replaceAll("\"", ""); if (!line.startsWith("ERROR"){ //bla bla continue; } record = line.split(","); //more error handling // build the object and put it in HashMap } //exceptions handling, closing connection and reader
Is there any existing library that would help me to speed up things? Can I improve existing code?
解决方案Apache Commons CSV
Have you seen Apache Commons CSV?
Caveat On Using
split
Another thing to bear in mind is that
split
only returns a view of the data, meaning that the originalline
object is not eligible for garbage collection whilst there is a reference to any of its views. Perhaps making a defensive copy will help? (Java bug report)这篇关于快速CSV解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!