将CSV文件中的大量数据有效地添加到Java中的SQLite数据库中 [英] Efficiently adding huge amounts of data from CSV files into an SQLite DB in Java

查看：136 发布时间：2020/4/29 3:32:00 java performance sqlite csv large-files

本文介绍了将CSV文件中的大量数据有效地添加到Java中的SQLite数据库中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试将CSV文件中的值解析为SQLite DB，但是文件很大(〜2,500,000行).我将程序运行了几个小时，并打印出了要运行的位置，但是根据我的计算，该文件将需要大约100个小时才能完全解析，所以我停止了它.

I'm trying to parse values from a CSV file to a SQLite DB, however the file is quite large (~2,500,000 lines). I ran my program for a a few hours, printing where it was up to, but by my calculation, the file would have taken about 100 hours to parse completely, so I stopped it.

我将不得不至少每周一次在一个新的CSV文件上(作为上一个程序的90％左右)在后台运行此程序.我提出了一些解决方案来改进我的程序.但是我对数据库了解不多，所以我对每个解决方案都有疑问.

I'm going to have to run this program as a background process at least once a week, on a new CSV file that is around 90% similar to the previous one. I have come up with a few solutions to improve my program. However I don't know much about databases, so I have questions about each of my solutions.

是否有比我现有的方法更有效的读取CSV文件的方法?

Is there a more efficient way to read a CSV file than what I have already?

是否正在实例化ObjectOutputStream并将其存储为BLOB，这在计算上显着昂贵?我可以直接添加值，但是稍后使用BLOB，因此现在存储它可以避免多次实例化一个新值.

Is instantiating an ObjectOutputStream, and storing it as a BLOB significantly computationally expensive? I could directly add the values instead, but I use the BLOB later, so storing it now saves me from instantiating a new one multiple times.

连接池或以其他方式更改我使用Connection的方式会更有效吗?

Would connection pooling, or changing the way I use the Connection in some other way be more efficient?

我将URL列设置为UNIQUE，因此我可以使用INSERT或IGNORE，但是在较小的数据集(约10000行)上进行测试表明，与删除表并重新填充相比，没有任何性能提升.有没有更快速的方法来仅添加唯一值?

I'm setting the URL column as UNIQUE so I can use INSERT OR IGNORE, but testing this on smaller datasets(~10000 lines) indicates that there is no performance gain compared to dropping the table and repopulating. Is there a faster way to add only unique values?

我正在犯任何明显的错误吗? (再次，我对数据库知之甚少)

Are there any obvious mistakes I'm making? (Again, I know very little about databases)

public class Database{

public void createResultsTable(){
    Statement stmt;
    String sql = "CREATE TABLE results("
            + "ID       INTEGER     NOT NULL    PRIMARY KEY AUTOINCREMENT, "
            + "TITLE    TEXT        NOT NULL, "
            + "URL      TEXT        NOT NULL    UNIQUE, "
            ...
            ...
            + "SELLER   TEXT        NOT NULL, "
            + "BEAN     BLOB);";
    try {
        stmt = c.createStatement();
        stmt.executeUpdate(sql);
    } catch (SQLException e) { e.printStackTrace();}


}


public void addCSVToDatabase(Connection conn, String src){

    BufferedReader reader = null;
    DBEntryBean b;
    String[] vals;

    try{
        reader = new BufferedReader(new InputStreamReader(new FileInputStream(src), "UTF-8"));
        for(String line; (line = reader.readLine()) != null;){
            //Each line takes the form: "title|URL|...|...|SELLER"
            vals = line.split("|");

            b = new DBEntryBean();
            b.setTitle(vals[0]);
            b.setURL(vals[1]);
            ...
            ...
            b.setSeller(vals[n]);

            insert(conn, b);
        }
    } catch(){

    }
}


public void insert(Connection conn, DBEntryBean b){

    PreparedStatement pstmt = null;
    String sql = "INSERT OR IGNORE INTO results("
            + "TITLE, "
            + "URL, "
            ...
            ...
            + "SELLER, "
            + "BEAN"
            + ");";

    try {
        pstmt = c.prepareStatement(sql);
        pstmt.setString(Constants.DB_COL_TITLE, b.getTitle());      
        pstmt.setString(Constants.DB_COL_URL, b.getURL());      
        ...
        ...
        pstmt.setString(Constants.DB_COL_SELLER, b.getSeller());

        // ByteArrayOutputStream baos = new ByteArrayOutputStream();
        // oos = new ObjectOutputStream(baos);
        // oos.writeObject(b);
        // byte[] bytes = baos.toByteArray();
        // pstmt.setBytes(Constants.DB_COL_BEAN, bytes);
        pstmt.executeUpdate();

    } catch (SQLException e) { e.printStackTrace(); 
    } finally{
        if(pstmt != null){
            try{ pstmt.close(); }
            catch (SQLException e) { e.printStackTrace(); }
        }

    }
}


}

将CSV文件中的大量数据有效地添加到Java中的SQLite数据库中 [英] Efficiently adding huge amounts of data from CSV files into an SQLite DB in Java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

将CSV文件中的大量数据有效地添加到Java中的SQLite数据库中 [英] Efficiently adding huge amounts of data from CSV files into an SQLite DB in Java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭