快速CSV解析 [英] Fast CSV parsing

查看:93
本文介绍了快速CSV解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个下载CSV文件并解析它的java服务器应用程序。解析可能需要5到45分钟,并且每小时发生。这种方法是应用程序的瓶颈,因此它不是过早优化。代码到目前为止:

  client.executeMethod(method); 
InputStream in = method.getResponseBodyAsStream(); // this is http stream

String line;
String [] record;

reader = new BufferedReader(new InputStreamReader(in),65536);

try {
//读标题行
line = reader.readLine();
//一些代码
while((line = reader.readLine())!= null){
//更多代码

line = line.replaceAll \\, \空值\);

//现在删除所有的引号
line = line.replaceAll(\,);


if !line.startsWith(ERROR){
// bla bla
continue;
}

record = line.split(,);
//更多错误处理
//构建对象并将其放在HashMap中
}
//异常处理,关闭连接和读取器



是否有任何现有的库可以帮助我加快进度?我可以改进现有的代码吗?

解决方案

Apache Commons CSV



您看过 Apache Commons CSV



注意使用 c>



另一个要记住的是, split 只返回数据视图,原始对象不符合垃圾回收的条件,同时存在对其任何视图的引用。也许做防守副本会有帮助吗? ( Java错误报告


I have a java server app that download CSV file and parse it. The parsing can take from 5 to 45 minutes, and happens each hour.This method is a bottleneck of the app so it's not premature optimization. The code so far:

        client.executeMethod(method);
        InputStream in = method.getResponseBodyAsStream(); // this is http stream

        String line;
        String[] record;

        reader = new BufferedReader(new InputStreamReader(in), 65536);

        try {
            // read the header line
            line = reader.readLine();
            // some code
            while ((line = reader.readLine()) != null) {
                 // more code

                 line = line.replaceAll("\"\"", "\"NULL\"");

                 // Now remove all of the quotes
                 line = line.replaceAll("\"", "");     


                 if (!line.startsWith("ERROR"){
                   //bla bla 
                    continue;
                 }

                 record = line.split(",");
                 //more error handling
                 // build the object and put it in HashMap
         }
         //exceptions handling, closing connection and reader

Is there any existing library that would help me to speed up things? Can I improve existing code?

解决方案

Apache Commons CSV

Have you seen Apache Commons CSV?

Caveat On Using split

Another thing to bear in mind is that split only returns a view of the data, meaning that the original line object is not eligible for garbage collection whilst there is a reference to any of its views. Perhaps making a defensive copy will help? (Java bug report)

这篇关于快速CSV解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆