如何将1亿行加载到内存中 [英] how do I load 100 million rows in to memory

查看：213 发布时间：2018/12/10 23:33:21 java sql jdbc out-of-memory

本文介绍了如何将1亿行加载到内存中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要从MySQL数据库加载1亿多行到内存中。我的java程序失败，带有 java.lang.OutOfMemoryError：Java堆空间
我的机器中有8GB RAM，我在JVM选项中给了-Xmx6144m。 / p>

这是我的代码

  public List< Record> loadTrainingDataSet（）{
 
 ArrayList< Record> records = new ArrayList< Record>（）; 
 try {
 Statement s = conn.createStatement（java.sql.ResultSet.TYPE_FORWARD_ONLY，java.sql.ResultSet.CONCUR_READ_ONLY）; 
 s.executeQuery（SELECT movie_id，customer_id，rating FROM ratings）; 
 ResultSet rs = s.getResultSet（）; 
 int count = 0; 
 while（rs.next（））{

任何想法如何克服这个问题？

UPDATE

我遇到了这篇文章 ，以及基于下面的评论我更新了我的代码。我似乎能够以相同的-Xmx6144m数量将数据加载到内存中，但需要很长时间。

这是我的代码。

  ... 
 import org.apache.mahout.math.SparseMatrix; 
 ... 
 
 @Override 
 public SparseMatrix loadTrainingDataSet（）{
 long t1 = System.currentTimeMillis（）; 
 SparseMatrix评级=新的SparseMatrix（NUM_ROWS，NUM_COLS）; 
 int REC_START = 0; 
 int REC_END = 0; 
 
 try {
 for（int i = 1; i< = 101; i ++）{
 long t11 = System.currentTimeMillis（）; 
 REC_END = 1000000 * i; 
语句s = conn.createStatement（java.sql.ResultSet.TYPE_FORWARD_ONLY，
 java.sql.ResultSet.CONCUR_READ_ONLY）; 
 s.setFetchSize（Integer.MIN_VALUE）; 
 ResultSet rs = s.executeQuery（SELECT movie_id，customer_id，rating FROM ratings LIMIT+ REC_START +，+ REC_END）; // 100480507 
 while（rs.next（））{
 int movieId = rs.getInt（movie_id）; 
 int customerId = rs.getInt（customer_id）; 
字节评级=（字节）rs.getInt（rating）; 
 ratings.set（customerId，movieId，rating）; 
} 
 long t22 = System.currentTimeMillis（）; 
 System.out.println（Round+ i +completed+（t22-t11）/ 1000 +seconds）; 
 rs.close（）; 
 s.close（）; 
} 
 
} catch（例外e）{
 System.err.println（无法连接到数据库服务器+ e）; 
} finally {
 if（conn！= null）{
 try {
 conn.close（）; 
 System.out.println（数据库连接已终止）; 
} catch（异常e）{/ *忽略关闭错误* /} 
} 
} 
 long t2 = System.currentTimeMillis（）; 
 System.out.println（Took+（t2-t1）/ 1000 +秒）; 
回报率; 
}

要加载前100,000行，需要2秒钟。要加载29个100,000行，需要46秒。我在中间停止了这个过程，因为它耗费了太多时间。这些可接受的时间是多少？有没有办法提高此代码的性能？
我在8GB RAM 64位Windows机器上运行。

解决方案

一亿条记录意味着每条记录可能需要最多50个字节，以便适合6 GB +一些额外的空间用于其他分配。在Java中，50字节不算什么;仅仅 Object [] 每个元素占用32个字节。您必须找到一种方法立即在 while（rs.next（））循环中使用结果，而不是完全保留它们。

I have the need of loading 100 million+ rows from a MySQL database in to memory. My java program fails with java.lang.OutOfMemoryError: Java heap space I have 8GB RAM in my machine and I have given -Xmx6144m in my JVM options.

This is my code

public List<Record> loadTrainingDataSet() {

    ArrayList<Record> records = new ArrayList<Record>();
    try {
        Statement s = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);
        s.executeQuery("SELECT movie_id,customer_id,rating FROM ratings");
        ResultSet rs = s.getResultSet();
        int count = 0;
        while (rs.next()) {

Any idea how to overcome this problem?

UPDATE

I came across this post, as well as based on the comments below I updated my code. It seems I am able to load the data to memory with the same -Xmx6144m amount, but it takes a long time.

Here is my code.

...
import org.apache.mahout.math.SparseMatrix;
...

@Override
public SparseMatrix loadTrainingDataSet() {
    long t1 = System.currentTimeMillis();
    SparseMatrix ratings = new SparseMatrix(NUM_ROWS,NUM_COLS);
    int REC_START = 0;
    int REC_END = 0;

    try {
        for (int i = 1; i <= 101; i++) {
            long t11 = System.currentTimeMillis();
            REC_END = 1000000 * i;
            Statement s = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
                    java.sql.ResultSet.CONCUR_READ_ONLY);
            s.setFetchSize(Integer.MIN_VALUE);
            ResultSet rs = s.executeQuery("SELECT movie_id,customer_id,rating FROM ratings LIMIT " + REC_START + "," + REC_END);//100480507
            while (rs.next()) {
                int movieId = rs.getInt("movie_id");
                int customerId = rs.getInt("customer_id");
                byte rating = (byte) rs.getInt("rating");
                ratings.set(customerId,movieId,rating);
            }
            long t22 = System.currentTimeMillis();
            System.out.println("Round " + i + " completed " + (t22 - t11) / 1000 + " seconds");
            rs.close();
            s.close();
        }

    } catch (Exception e) {
        System.err.println("Cannot connect to database server " + e);
    } finally {
        if (conn != null) {
            try {
                conn.close();
                System.out.println("Database connection terminated");
            } catch (Exception e) { /* ignore close errors */ }
        }
    }
    long t2 = System.currentTimeMillis();
    System.out.println(" Took " + (t2 - t1) / 1000 + " seconds");
    return ratings;
}

To load first 100,000 rows it took 2 seconds. To load 29th 100,000 rows it took 46 seconds. I stopped the process in the middle since it was taking too much time. Are these acceptable amounts of time? Is there a way to improve the performance of this code? I am running this on 8GB RAM 64bit windows machine.

解决方案

A hundred million records means that each record may take up at most 50 bytes in order to fit within 6 GB + some extra space for other allocations. In Java 50 bytes is nothing; a mere Object[] takes 32 bytes per element. You must find a way to immediately use the results in your while (rs.next()) loop and not retain them in full.

这篇关于如何将1亿行加载到内存中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何将1亿行加载到内存中 [英] how do I load 100 million rows in to memory

问题描述

UPDATE

UPDATE

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何将1亿行加载到内存中 [英] how do I load 100 million rows in to memory

问题描述

UPDATE

UPDATE

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭