将大量记录MySQL读入Java [英] Reading large amount of records MySQL into Java
问题描述
拥有一个我需要处理的带有+8百万条记录的MySQL数据库(这在数据库本身无法完成),我在尝试将它们读入我的Java应用程序时会遇到问题。
Having a MySQL database with +8 million records that I need to process (that can't be done in the database itself), I encounter issues when trying to read them into my Java application.
我已经尝试过一些有类似问题的人的解决方案(例如, link )但是,没有一个对我有用。我试图设置FetchSize和所有,但没有运气!我的应用程序是使用BlockingQueue构建的,Producer从数据库中连续读取数据,将其存储在队列中,以便Consumer可以处理它。这样我就可以同时限制主内存中的记录数量。
I already tried some solutions of people with similar problems (eg., link) however, none have worked out for me. I tried to set the FetchSize and all, but no luck! My application is built making use of a BlockingQueue of which the Producer reads data continously from the database, stores it in the queue so the Consumer can process it. This way I limit the amount of records in main memory at the same time.
我的代码适用于少量记录(我测试了1000条记录)所以我建议需要修复从数据库到我的应用程序的费用。
My code works for small amount of records (I tested for 1000 records) so I suggest the fase from database to my application needs to be fixed.
Edit1
connection = ConnectionFactory.getConnection(DATABASE);
preparedStatement = connection.prepareStatement(query, java.sql.ResultSet.CONCUR_READ_ONLY, java.sql.ResultSet.TYPE_FORWARD_ONLY);
preparedStatement.setFetchSize(1000);
preparedStatement.executeQuery();
rs = preparedStatement.getResultSet();
Edit2
最终,除了看到我的记忆力下降之外,我得到了一些输出。我收到此错误:
Eventually now I get some output other than seeing my memory go down. I get this error:
Exception in thread "Thread-0" java.lang.OutOfMemoryError: Java heap space
at com.mysql.jdbc.Buffer.<init>(Buffer.java:59)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:2089)
at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:3554)
at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:491)
at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:3245)
at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:2413)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2836)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2828)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2777)
at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1651)
at razoralliance.dao.DataDAOImpl.getAllDataRS(DataDAOImpl.java:38)
at razoralliance.app.DataProducer.run(DataProducer.java:34)
at java.lang.Thread.run(Thread.java:722)
Edit3
我围绕Producer-Consumer模式做了一些研究,结果发现,当消费者无法跟上Producer,队列会自动放大,最终耗尽内存。所以我切换到ArrayBlockingQueue,这使得大小固定。但是,我仍然得到记忆。 Eclipse Memory Analyzer表示,ArrayBlockingQueue占用了我内存的65.31%,而内存中只有1000个对象,所有文本都有4个字段。
I did some more research around the Producer-Consumer pattern and it turns out that, when the Consumer can not keep up with the Producer, the queue will automatically enlarge thus eventually run out of memory. So I switched to ArrayBlockingQueue which makes the size fixed. However, I still get memoryleaks. Eclipse Memory Analyzer says that ArrayBlockingQueue occupies 65,31% of my memory while it only has 1000 objects in memory with 4 fields all text.
推荐答案
您需要流式传输结果。使用MySQL驱动程序,您必须为 ResultSet设置
。另外,相应地设置提取大小: CONCUR_READ_ONLY
和 TYPE_FORWARD_ONLY
stmt.setFetchSize(Integer.MIN_VALUE);
You will need to stream your results. With the MySQL driver it appears you have to set CONCUR_READ_ONLY
and TYPE_FORWARD_ONLY
for your ResultSet
. Also, set the fetch size accordingly: stmt.setFetchSize(Integer.MIN_VALUE);
默认情况下,ResultSet完全检索并存储在内存中。在大多数情况下,这是最有效的操作方式,并且由于MySQL网络协议的设计更容易实现。如果您正在使用具有大量行或大值的ResultSet,并且无法在JVM中为所需内存分配堆空间,则可以告诉驱动程序一次将结果流回一行。
By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate, and due to the design of the MySQL network protocol is easier to implement. If you are working with ResultSets that have a large number of rows or large values, and cannot allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time.
要启用此功能,请按以下方式创建Statement实例:
To enable this functionality, create a Statement instance in the following manner:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY); stmt.setFetchSize(Integer.MIN_VALUE);
只进,只读结果集的组合,获取大小为Integer.MIN_VALUE的信号用作驱动程序逐行传输结果集的信号。在此之后,将逐行检索使用该语句创建的任何结果集。
The combination of a forward-only, read-only result set, with a fetch size of Integer.MIN_VALUE serves as a signal to the driver to stream result sets row-by-row. After this, any result sets created with the statement will be retrieved row-by-row.
这种方法有一些警告......
- 使用MySQL流式传输大型结果集
- http://dev.mysql.com/doc/connector-j/en/connector-j-reference-implementation- notes.html
- Streaming large result sets with MySQL
- http://dev.mysql.com/doc/connector-j/en/connector-j-reference-implementation-notes.html
这篇关于将大量记录MySQL读入Java的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!