使用Hibernate的ScrollableResults缓慢读取9000万条记录 [英] Using Hibernate's ScrollableResults to slowly read 90 million records

查看:123
本文介绍了使用Hibernate的ScrollableResults缓慢读取9000万条记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只需要使用Hibernate读取MySQL数据库中的表中的每一行,并根据它编写一个文件。但是有9000万行,它们非常大。因此,它似乎是适当的:

  ScrollableResults results = session.createQuery(SELECT person FROM Person person)
.setReadOnly(true).setCacheable(false).scroll(ScrollMode.FORWARD_ONLY);
while(results.next())
storeInFile(results.get()[0]);

问题是,上面的代码会尝试在加载到RAM之前将所有9000万行加载到RAM中循环...并且会用OutOfMemoryError来杀死我的内存:Java堆空间异常:(。



所以我猜ScrollableResults不是我正在寻找的东西?处理这个问题的正确方法?我不介意这个while循环是否需要几天(好吧,我真的很喜欢它)。



我想唯一的其他方法处理这个问题的方法是使用setFirstResult和setMaxResults遍历结果,并使用常规的Hibernate结果而不是ScrollableResults。感觉这样做效率会很低,而且在8900万次调用setFirstResult时会花费很长的时间row ...



更新:setFirstResult / setMaxResults不起作用,它变得像我担心的那样需要很长时间才能抵达偏移量。这是一个解决方案!这不是一个非常标准的程序我愿意放弃Hibernate,并使用JDBC或任何它需要的。



更新2:解决方案我想出了哪些工作正常,不是很好,基本上是这样的形式:

  select * from person where id> <偏移>和< other_conditions>限制1 

由于我还有其他条件,即使是索引中的所有条件,它仍然不如我'd喜欢它...所以仍然打开其他建议..

解决方案

使用setFirstResult和setMaxResults是您唯一的选择我知道。



传统上,可滚动结果集只会根据需要将行传输到客户端。不幸的是,MySQL Connector / J实际上伪装它,它执行整个查询并将其传输到客户端,所以驱动程序实际上已将整个结果集加载到RAM中,并将滴灌给您(由您的内存不足问题证明) 。你有正确的想法,这只是在MySQL的Java驱动程序的缺点。



我发现没有办法解决这个问题,所以去加载大块使用常规setFirst /最大的方法。对不起,我们是坏消息的传播者。



只要确保使用无状态会话,以便没有会话级缓存或脏跟踪等。 b

编辑:



您的UPDATE 2是最好的,除非您打破MySQL J / Connector。尽管没有理由无法提高查询的限制。假如你有足够的内存来保存索引,这应该是一个稍微便宜的操作。我会稍微修改它,并且一次抓取一个批次,并使用该批次的最高ID来抓取下一批。



注意:这只会起作用如果 other_conditions 使用相等(不允许范围条件)并且索引的最后一列为 id

  select * 
from person
where id> < max_id_of_last_batch>和< other_conditions>
按ID排序asc
限制< batch_size>


I simply need to read each row in a table in my MySQL database using Hibernate and write a file based on it. But there are 90 million rows and they are pretty big. So it seemed like the following would be appropriate:

ScrollableResults results = session.createQuery("SELECT person FROM Person person")
            .setReadOnly(true).setCacheable(false).scroll(ScrollMode.FORWARD_ONLY);
while (results.next())
    storeInFile(results.get()[0]);

The problem is the above will try and load all 90 million rows into RAM before moving on to the while loop... and that will kill my memory with OutOfMemoryError: Java heap space exceptions :(.

So I guess ScrollableResults isn't what I was looking for? What is the proper way to handle this? I don't mind if this while loop takes days (well I'd love it to not).

I guess the only other way to handle this is to use setFirstResult and setMaxResults to iterate through the results and just use regular Hibernate results instead of ScrollableResults. That feels like it will be inefficient though and will start taking a ridiculously long time when I'm calling setFirstResult on the 89 millionth row...

UPDATE: setFirstResult/setMaxResults doesn't work, it turns out to take an unusably long time to get to the offsets like I feared. There must be a solution here! Isn't this a pretty standard procedure?? I'm willing to forgo Hibernate and use JDBC or whatever it takes.

UPDATE 2: the solution I've come up with which works ok, not great, is basically of the form:

select * from person where id > <offset> and <other_conditions> limit 1

Since I have other conditions, even all in an index, it's still not as fast as I'd like it to be... so still open for other suggestions..

解决方案

Using setFirstResult and setMaxResults is your only option that I'm aware of.

Traditionally a scrollable resultset would only transfer rows to the client on an as required basis. Unfortunately the MySQL Connector/J actually fakes it, it executes the entire query and transports it to the client, so the driver actually has the entire result set loaded in RAM and will drip feed it to you (evidenced by your out of memory problems). You had the right idea, it's just shortcomings in the MySQL java driver.

I found no way to get around this, so went with loading large chunks using the regular setFirst/max methods. Sorry to be the bringer of bad news.

Just make sure to use a stateless session so there's no session level cache or dirty tracking etc.

EDIT:

Your UPDATE 2 is the best you're going to get unless you break out of the MySQL J/Connector. Though there's no reason you can't up the limit on the query. Provided you have enough RAM to hold the index this should be a somewhat cheap operation. I'd modify it slightly, and grab a batch at a time, and use the highest id of that batch to grab the next batch.

Note: this will only work if other_conditions use equality (no range conditions allowed) and have the last column of the index as id.

select * 
from person 
where id > <max_id_of_last_batch> and <other_conditions> 
order by id asc  
limit <batch_size>

这篇关于使用Hibernate的ScrollableResults缓慢读取9000万条记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆