打开CSV性能以写入数据 [英] Open CSV Performance to write data

查看:78
本文介绍了打开CSV性能以写入数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我通过了一个链接: https://github.com/hyee/OpenCSV 由于setAsyncMode,RESULT_FETCH_SIZE而将JDBC结果集的写入时间缩短为CSV

I came through a link: https://github.com/hyee/OpenCSV which drastically improves the writing time of the JDBC ResultSet to CSV due to setAsyncMode, RESULT_FETCH_SIZE

//Extract ResultSet to CSV file, auto-compress if the fileName extension is ".zip" or ".gz"
//Returns number of records extracted
public int ResultSet2CSV(final ResultSet rs, final String fileName, final String header, final boolean aync) throws Exception {
    try (CSVWriter writer = new CSVWriter(fileName)) {
        //Define fetch size(default as 30000 rows), higher to be faster performance but takes more memory
        ResultSetHelperService.RESULT_FETCH_SIZE=10000;
        //Define MAX extract rows, -1 means unlimited.
        ResultSetHelperService.MAX_FETCH_ROWS=20000;
        writer.setAsyncMode(aync);
        int result = writer.writeAll(rs, true);
        return result - 1;
    }
}

但是问题是我不知道如何将以上内容合并到我的需求中.由于该链接涉及许多其他类,因此我不确定它们做什么以及我是否需要它.仍然,我尝试了一下,但是每当启用2条注释行代码时,它都无法编译.下面是我的代码.

But the problem is I don't know how I can merge above into my requirement. As the link has many other classes involved which I am not sure what they do and if I even need it for my requirement. Still, I tried but it fails to compile whenever I enable 2 commented line code. Below is my code.

对于我将如何实现这一目标的任何帮助,将不胜感激.

Any help on how I can achieve this will be greatly appreciated.

package test;



import java.io.BufferedWriter;
import java.io.FileWriter;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
import java.util.Date;


import com.opencsv.CSVWriter;
import com.opencsv.ResultSetHelperService;

public class OpenCSVTest1
{

    static Connection con =null;
    static Statement stmt = null;
    static ResultSet rs = null;

    public static void main(String args[]) throws Exception
    { 


        connection ();
        retrieveData(con);

    }

    private static void connection() throws Exception 
    {


        try
        {
            Class.forName("<jdbcdriver>");
            con = DriverManager.getConnection("jdbc:","<username>","<pass>");
            System.out.println("Connection successful");
        }


        catch (Exception e)
        {
            System.out.println("Exception while establishing sql connection");
            throw e;
        }
    }


    private static void retrieveData(Connection con) throws Exception
    {
        try
        {
            stmt=con.createStatement(); 
            stmt = con.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
            String query = "SELECT  * FROM dbo.tablename";

            rs=stmt.executeQuery(query);

            CSVWriter writer = new CSVWriter(new BufferedWriter(new FileWriter("C:\\Data\\File1.csv")));    

            ResultSetHelperService service = new ResultSetHelperService(); 

            /***    ResultSetHelperService.RESULT_FETCH_SIZE=10000;   ***/    // to add 


            service.setDateTimeFormat("yyyy-MM-dd HH:mm:ss.SSS"); 

            System.out.println("**** Started writing Data to CSV **** " +  new Date());         

            writer.setResultService(service);

            /***   writer.setAsyncMode(aync);  ***/   // to add 


            int lines = writer.writeAll(rs, true, true, false); 

            writer.flush();
            writer.close();

            System.out.println("** OpenCSV -Completed writing the resultSet at " +  new Date() + " Number of lines written to the file " + lines);  
        }


        catch (Exception e)
        {
            System.out.println("Exception while retrieving data" );
            e.printStackTrace();
            throw e;
        }

        finally 
        {
            rs.close();
            stmt.close();
            con.close();

        }
    }







}

更新

我已经更新了我的代码.现在,代码正在使用writeAll方法立即将完整的结果集写入CSV,这会浪费时间.

I have updated my code. Right now code is writing complete resultset in CSV at once using writeAll method which is resulting in time consumption.

现在我要做的是将结果集批量写入CSV,因为结果集的第一列将始终通过SELECT查询自动增量"列(Sqno)动态生成 ,其值为(1,2,3 ..)所以不确定我如何读取结果集的第一列并将其拆分为CSV格式.可能是HashMap可能有所帮助,所以如果需要,我还添加了resultset-tohashmap转换代码.

Now what I want to do is write resultset to CSV in batches as resultset's first column will always have dynamically generated via SELECT query Auto Increment column (Sqno) with values as (1,2,3..) So not sure how I can read result sets first column and split it accoridngly to write in CSV. may be HashMap might help, so I have also added resultset-tohashmap conversion code if required.

import com.opencsv.CSVWriter;
import com.opencsv.ResultSetHelperService;
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.ArrayList;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class OpenCSVTest1
{ 
    static int fetchlimit_src  = 100;
    static Connection con =null;
    static Statement stmt = null;
    static ResultSet rs = null;
    static String filename = "C:\\Data\\filename.csv";
    static CSVWriter writer;
    public static void main(String args[])
    {  
        try
        {  
            connection();
            retrieveData(con);
        }
        catch(Exception e)
        { 
            System.out.println(e);
        }  
    }
    private static void connection() throws Exception 
    {
        try
        {
            Class.forName("<jdbcdriver>");
            con = DriverManager.getConnection("jdbc:","<username>","<pass>");
            System.out.println("Connection successful");
        }
        catch (Exception e)
        {
            System.out.println("Exception while establishing sql connection");
            throw e;
        }
    }  
    private static void retrieveData(Connection con) throws Exception
    {
        try
        {
            stmt=con.createStatement(); 
            String query = "SELECT ROWNUM AS Sqno, * FROM dbo.tablename ";   // Oracle
            //  String query = "SELECT ROW_NUMBER() OVER(ORDER BY Id ASC) AS Sqno, *  FROM dbo.tablename ";  // SQLServer
            System.out.println(query);
            stmt = con.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
            stmt.setFetchSize(fetchlimit_src);
            System.out.println("**** Started querying src **** " +  new Date());
            rs=stmt.executeQuery(query);
            System.out.println("**** Completing querying src **** " +  new Date());
            //  resultset_List(rs);   // If required store resultset(rs) to HashMap
            writetoCSV(rs,filename);  
            /** How to write resultset to CSV in batches instead of writing all at once to speed up write performance ? 
             * Hint: resultset first column is Autoincrement [Sqno] (1,2,3...) which might help to split result in batches.
             *
             **/
        }
        catch (Exception e)
        {
            System.out.println("Exception while retrieving data" );
            e.printStackTrace();
            throw e;
        }
        finally 
        {
            rs.close();
            stmt.close();
            con.close();
        }
    }
    private static List<Map<String, Object>> resultset_List(ResultSet rs) throws SQLException
    {
        ResultSetMetaData md = rs.getMetaData();
        int columns = md.getColumnCount();
        List<Map<String, Object>> rows = new ArrayList<Map<String, Object>>();
        while (rs.next())
        {
            Map<String, Object> row = new HashMap<String, Object>(columns);
            for(int i = 1; i <= columns; ++i)
            {
                row.put(md.getColumnName(i), rs.getObject(i));
            }
            rows.add(row);
        }
        //    System.out.println(rows.toString());
        return rows;
    }
    private static void writetoCSV(ResultSet rs, String filename) throws Exception
    {
        try
        {
            writer = new CSVWriter(new BufferedWriter(new FileWriter(filename)));
            ResultSetHelperService service = new ResultSetHelperService();
            service.setDateTimeFormat("yyyy-MM-dd HH:mm:ss.SSS");
            long batchlimit = 1000;
            long Sqno  = 1;
            ResultSetMetaData rsmd = rs.getMetaData();
            String columnname = rsmd.getColumnLabel(1);  // To retrieve columns with labels (for example SELECT ROWNUM AS Sqno)
            System.out.println("**** Started writing Data to CSV **** " +  new Date());
            writer.setResultService(service);
            int lines = writer.writeAll(rs, true, true, false); 
    System.out.println("** OpenCSV -Completed writing the resultSet at " +  new Date() + " Number of lines written to the file " + lines);
        }
        catch (Exception e)
        {
            System.out.println("Exception while writing data" );
            e.printStackTrace();
            throw e;
        }
        finally
        {
            writer.flush();
            writer.close();
        }
    }
}  

推荐答案

您应该能够使用OpenCSV示例,几乎完全与文档中提供的一样.因此,您无需编写任何自己的批处理逻辑.

You should be able to use the OpenCSV sample, pretty much exactly as it is provided in the documentation. So, there should be no need for you to write any of your own batching logic.

我能够在大约10秒内将600万条记录结果集写入CSV文件.需要明确的是-这只是文件写入时间,而不是DB数据获取时间-但我认为这应该足够快以满足您的需求.

I was able to write a 6 million record result set to a CSV file in about 10 seconds. To be clear -that was just the file-write time, not the DB data-fetch time - but I think that should be fast enough for your needs.

这是您的代码,根据其记录的方法进行了改编,以使用OpenCSV ...但是请参阅我的注释末尾的警告!

Here is your code, with adaptations for using OpenCSV based on its documented approach... But please see the warning at the end of my notes!

import com.opencsv.CSVWriter;
import com.opencsv.ResultSetHelperService;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
import java.util.Date;
import java.text.SimpleDateFormat;

public class OpenCSVDemo {

    static int fetchlimit_src = 100;
    static Connection con = null;
    static Statement stmt = null;
    static ResultSet rs = null;
    static String filename = "C:\\Data\\filename.csv";

    public static void main(String args[]) {
        try {
            connection();
            retrieveData(con);

        } catch (Exception e) {
            System.out.println(e);
        }
    }

    private static void connection() throws Exception {
        try {
            final String jdbcDriver = "YOURS GOES HERE";
            final String dbUrl = "YOURS GOES HERE";
            final String user = "YOURS GOES HERE";
            final String pass = "YOURS GOES HERE";
            Class.forName(jdbcDriver);
            con = DriverManager.getConnection(dbUrl, user, pass);
            System.out.println("Connection successful");
        } catch (Exception e) {
            System.out.println("Exception while establishing sql connection");
            throw e;
        }
    }

    private static void retrieveData(Connection con) throws Exception {
        try {
            stmt = con.createStatement();
            String query = "select title_id, primary_title from imdb.title";
            System.out.println(query);
            stmt = con.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
            stmt.setFetchSize(fetchlimit_src);
            System.out.println("**** Started querying src **** " + new Date());
            rs = stmt.executeQuery(query);
            System.out.println("**** Completing querying src **** " + new Date());
            //  resultset_List(rs);   // If required store resultset(rs) to HashMap

            System.out.println();
            String timeStamp = new SimpleDateFormat("yyyy.MM.dd.HH.mm.ss").format(new Date());
            System.out.println("Started writing CSV:  " + timeStamp);
            writeToCsv(rs, filename, null, Boolean.FALSE);
            timeStamp = new SimpleDateFormat("yyyy.MM.dd.HH.mm.ss").format(new Date());
            System.out.println("Finished writing CSV: " + timeStamp);
            System.out.println();

        } catch (Exception e) {
            System.out.println("Exception while retrieving data");
            e.printStackTrace();
            throw e;
        } finally {
            rs.close();
            stmt.close();
            con.close();
        }
    }

    public static int writeToCsv(final ResultSet rs, final String fileName, 
            final String header, final boolean aync) throws Exception {
        try (CSVWriter writer = new CSVWriter(fileName)) {
            //Define fetch size(default as 30000 rows), higher to be faster performance but takes more memory
            ResultSetHelperService.RESULT_FETCH_SIZE = 1000;
            //Define MAX extract rows, -1 means unlimited.
            ResultSetHelperService.MAX_FETCH_ROWS = 2000;
            writer.setAsyncMode(aync);
            int result = writer.writeAll(rs, true);
            return result - 1;
        }
    }

}

注意事项:

1)我将异步"设置为false:

1) I used "async" set to false:

writeToCsv(rs, filename, null, Boolean.FALSE);

您可能想尝试一下此设置和其他设置,以查看它们是否对您有重大影响.

You may want to experiment with this and the other settings to see if they make any significant difference for you.

2)关于您的评论"该链接还涉及许多其他类":与相关的干扰者JAR一样,OpenCSV库的整个JAR文件都需要包含在您的项目中.

2) Regarding your comment "the link has many other classes involved": The OpenCSV library's entire JAR file needs to be included in your project, as does the related disruptor JAR:

opencsv.jar
disruptor-3.3.6.jar

要获取JAR文件,请转到 GitHub页面,单击绿色按钮,选择zip下载文件,解压缩zip文件,然后在"OpenCSV-master \ release"文件夹中查找.

To get the JAR files, go to the GitHub page, click on the green button, select the zip download, unzip the zip file, and look in the "OpenCSV-master\release" folder.

以通常的方式将这两个JAR添加到您的项目中(取决于您构建项目的方式).

Add these two JARs to your project in the usual way (depends on how you build your project).

3)警告:当您使用Oracle的Java 8 JDK/JRE时,此代码可以正常运行.如果您尝试使用OpenJDK(例如,对于Java 13或类似版本),它将无法运行.这是由于幕后对隐藏类的某些更改.如果您有兴趣,请在此处.

3) WARNING: This code runs OK when you use Oracle's Java 8 JDK/JRE. If you try to use OpenJDK (e.g. for Java 13 or similar) it will not run. This is because of some changes behind the scenes to hidden classes. If you are interested, there are more details here.

如果您需要使用Java的OpenJDK版本,那么使用此CSV库所基于的库可能会更好一些:

If you need to use an OpenJDK version of Java, you may therefore have better luck with the library on which this CSV library is based: see here.

这篇关于打开CSV性能以写入数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆