用Java处理数百万个数据库记录 [英] Processing millions of database records in Java

查看:178
本文介绍了用Java处理数百万个数据库记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要编写一个批处理作业,该批处理作业从数据库表中获取行并基于特定条件,写入其他表或使用特定值更新该行.我们正在使用spring和jdbc来获取结果集,并使用计划每周运行一次的独立Java程序来遍历和处理记录.我知道这不是正确的方法,但是我们必须将其作为临时解决方案.随着记录增长到数百万,我们最终将遇到内存不足的异常,因此我知道这不是最好的方法.

你们中的任何人都可以推荐应对这种情况的最佳方法是什么吗?

使用线程并为每个线程获取1000条记录并并行处理它们?

(OR)

使用任何其他批处理机制来执行此操作(我知道有spring-batch,但从未使用过此方法)

(OR)

还有其他想法吗?

解决方案

批处理作业,它从数据库表中获取行并根据特定条件,将其写入其他表或使用特定值更新该行.

这听起来像是您应该在数据库内部执行的某种操作.例如,要获取特定行并根据特定条件对其进行更新,SQL具有 声明.要写入另一个表,可以使用 INSERT ... SELECT ... .

这些操作可能会变得相当复杂,但是我建议您尽一切努力在数据库内部执行此操作,因为将数据拉出以对其进行过滤非常慢,并且无法达到建立关系数据库的目的.

注意:请确保首先在非生产系统上对此进行试验,并实现所需的任何限制,以免在不景气的时候锁定生产表.

I have a requirement to write a batch job that fetches rows from a database table and based on a certain conditions, write to other tables or update this row with a certain value. We are using spring and jdbc to fetch the result set and iterate through and process the records using a standalone java program that is scheduled to run weekly. I know this is not the right way to do it, but we had to do it as a temporary solution. As the records grow into millions, we will end up with out of memory exceptions, so I know this is not the best approach.

Can any of you recommend what is the best way to deal with such a situation?

Use Threads and fetch 1000 records per thread and process them in parallel?

(OR)

Use any other batch mechanism to do this (i know there is spring-batch but have never used this)

(OR)

Any other ideas?

解决方案

a batch job that fetches rows from a database table and based on a certain conditions, write to other tables or update this row with a certain value.

This sounds like the sort of thing you should do inside the database. For example, to fetch a particular row and update it based on certain conditions, SQL has the UPDATE ... WHERE ... statement. To write to another table, you can use INSERT ... SELECT ....

These may get fairly complicated, but I suggest doing everything in your power to do this inside the database, since pulling the data out to filter it is incredibly slow and defeats the purpose of having a relational database.

Note: Make sure to experiment with this on a non-production system first, and implement any limits you need so you don't lock up production tables at bad times.

这篇关于用Java处理数百万个数据库记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆