用Java处理数百万个数据库记录 [英] Processing millions of database records in Java
问题描述
我需要编写一个批处理作业,该批处理作业从数据库表中获取行并基于特定条件,写入其他表或使用特定值更新该行.我们正在使用spring和jdbc来获取结果集,并使用计划每周运行一次的独立Java程序来遍历和处理记录.我知道这不是正确的方法,但是我们必须将其作为临时解决方案.随着记录增长到数百万,我们最终将遇到内存不足的异常,因此我知道这不是最好的方法.
你们中的任何人都可以推荐应对这种情况的最佳方法是什么吗?
使用线程并为每个线程获取1000条记录并并行处理它们?
(OR)
使用任何其他批处理机制来执行此操作(我知道有spring-batch,但从未使用过此方法)
(OR)
还有其他想法吗?
批处理作业,它从数据库表中获取行并根据特定条件,将其写入其他表或使用特定值更新该行.
这听起来像是您应该在数据库内部执行的某种操作.例如,要获取特定行并根据特定条件对其进行更新,SQL具有 INSERT ... SELECT ...
.>
这些操作可能会变得相当复杂,但是我建议您尽一切努力在数据库内部执行此操作,因为将数据拉出以对其进行过滤非常慢,并且无法达到建立关系数据库的目的.
注意:请确保首先在非生产系统上对此进行试验,并实现所需的任何限制,以免在不景气的时候锁定生产表.
I have a requirement to write a batch job that fetches rows from a database table and based on a certain conditions, write to other tables or update this row with a certain value. We are using spring and jdbc to fetch the result set and iterate through and process the records using a standalone java program that is scheduled to run weekly. I know this is not the right way to do it, but we had to do it as a temporary solution. As the records grow into millions, we will end up with out of memory exceptions, so I know this is not the best approach.
Can any of you recommend what is the best way to deal with such a situation?
Use Threads and fetch 1000 records per thread and process them in parallel?
(OR)
Use any other batch mechanism to do this (i know there is spring-batch but have never used this)
(OR)
Any other ideas?
a batch job that fetches rows from a database table and based on a certain conditions, write to other tables or update this row with a certain value.
This sounds like the sort of thing you should do inside the database. For example, to fetch a particular row and update it based on certain conditions, SQL has the UPDATE ... WHERE ...
statement. To write to another table, you can use INSERT ... SELECT ...
.
These may get fairly complicated, but I suggest doing everything in your power to do this inside the database, since pulling the data out to filter it is incredibly slow and defeats the purpose of having a relational database.
Note: Make sure to experiment with this on a non-production system first, and implement any limits you need so you don't lock up production tables at bad times.
这篇关于用Java处理数百万个数据库记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!