如何将1亿行加载到内存中 [英] how do I load 100 million rows in to memory

查看:213
本文介绍了如何将1亿行加载到内存中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从MySQL数据库加载1亿多行到内存中。我的java程序失败,带有 java.lang.OutOfMemoryError:Java堆空间
我的机器中有8GB RAM,我在JVM选项中给了-Xmx6144m。 / p>

这是我的代码

  public List< Record> loadTrainingDataSet(){

ArrayList< Record> records = new ArrayList< Record>();
try {
Statement s = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,java.sql.ResultSet.CONCUR_READ_ONLY);
s.executeQuery(SELECT movie_id,customer_id,rating FROM ratings);
ResultSet rs = s.getResultSet();
int count = 0;
while(rs.next()){

任何想法如何克服这个问题?






UPDATE



我遇到了这篇文章 ,以及基于下面的评论我更新了我的代码。我似乎能够以相同的-Xmx6144m数量将数据加载到内存中,但需要很长时间。



这是我的代码。

  ... 
import org.apache.mahout.math.SparseMatrix;
...

@Override
public SparseMatrix loadTrainingDataSet(){
long t1 = System.currentTimeMillis();
SparseMatrix评级=新的SparseMatrix(NUM_ROWS,NUM_COLS);
int REC_START = 0;
int REC_END = 0;

try {
for(int i = 1; i< = 101; i ++){
long t11 = System.currentTimeMillis();
REC_END = 1000000 * i;
语句s = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
s.setFetchSize(Integer.MIN_VALUE);
ResultSet rs = s.executeQuery(SELECT movie_id,customer_id,rating FROM ratings LIMIT+ REC_START +,+ REC_END); // 100480507
while(rs.next()){
int movieId = rs.getInt(movie_id);
int customerId = rs.getInt(customer_id);
字节评级=(字节)rs.getInt(rating);
ratings.set(customerId,movieId,rating);
}
long t22 = System.currentTimeMillis();
System.out.println(Round+ i +completed+(t22-t11)/ 1000 +seconds);
rs.close();
s.close();
}

} catch(例外e){
System.err.println(无法连接到数据库服务器+ e);
} finally {
if(conn!= null){
try {
conn.close();
System.out.println(数据库连接已终止);
} catch(异常e){/ *忽略关闭错误* /}
}
}
long t2 = System.currentTimeMillis();
System.out.println(Took+(t2-t1)/ 1000 +秒);
回报率;
}

要加载前100,000行,需要2秒钟。要加载29个100,000行,需要46秒。我在中间停止了这个过程,因为它耗费了太多时间。这些可接受的时间是多少?有没有办法提高此代码的性能?
我在8GB RAM 64位Windows机器上运行。

解决方案

一亿条记录意味着每条记录可能需要最多50个字节,以便适合6 GB +一些额外的空间用于其他分配。在Java中,50字节不算什么;仅仅 Object [] 每个元素占用32个字节。您必须找到一种方法立即在 while(rs.next())循环中使用结果,而不是完全保留它们。


I have the need of loading 100 million+ rows from a MySQL database in to memory. My java program fails with java.lang.OutOfMemoryError: Java heap space I have 8GB RAM in my machine and I have given -Xmx6144m in my JVM options.

This is my code

public List<Record> loadTrainingDataSet() {

    ArrayList<Record> records = new ArrayList<Record>();
    try {
        Statement s = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);
        s.executeQuery("SELECT movie_id,customer_id,rating FROM ratings");
        ResultSet rs = s.getResultSet();
        int count = 0;
        while (rs.next()) {

Any idea how to overcome this problem?


UPDATE

I came across this post, as well as based on the comments below I updated my code. It seems I am able to load the data to memory with the same -Xmx6144m amount, but it takes a long time.

Here is my code.

...
import org.apache.mahout.math.SparseMatrix;
...

@Override
public SparseMatrix loadTrainingDataSet() {
    long t1 = System.currentTimeMillis();
    SparseMatrix ratings = new SparseMatrix(NUM_ROWS,NUM_COLS);
    int REC_START = 0;
    int REC_END = 0;

    try {
        for (int i = 1; i <= 101; i++) {
            long t11 = System.currentTimeMillis();
            REC_END = 1000000 * i;
            Statement s = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
                    java.sql.ResultSet.CONCUR_READ_ONLY);
            s.setFetchSize(Integer.MIN_VALUE);
            ResultSet rs = s.executeQuery("SELECT movie_id,customer_id,rating FROM ratings LIMIT " + REC_START + "," + REC_END);//100480507
            while (rs.next()) {
                int movieId = rs.getInt("movie_id");
                int customerId = rs.getInt("customer_id");
                byte rating = (byte) rs.getInt("rating");
                ratings.set(customerId,movieId,rating);
            }
            long t22 = System.currentTimeMillis();
            System.out.println("Round " + i + " completed " + (t22 - t11) / 1000 + " seconds");
            rs.close();
            s.close();
        }

    } catch (Exception e) {
        System.err.println("Cannot connect to database server " + e);
    } finally {
        if (conn != null) {
            try {
                conn.close();
                System.out.println("Database connection terminated");
            } catch (Exception e) { /* ignore close errors */ }
        }
    }
    long t2 = System.currentTimeMillis();
    System.out.println(" Took " + (t2 - t1) / 1000 + " seconds");
    return ratings;
}

To load first 100,000 rows it took 2 seconds. To load 29th 100,000 rows it took 46 seconds. I stopped the process in the middle since it was taking too much time. Are these acceptable amounts of time? Is there a way to improve the performance of this code? I am running this on 8GB RAM 64bit windows machine.

解决方案

A hundred million records means that each record may take up at most 50 bytes in order to fit within 6 GB + some extra space for other allocations. In Java 50 bytes is nothing; a mere Object[] takes 32 bytes per element. You must find a way to immediately use the results in your while (rs.next()) loop and not retain them in full.

这篇关于如何将1亿行加载到内存中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆