在多线程程序中缓冲db插入 [英] Buffering db inserts in multithreaded program

查看:72
本文介绍了在多线程程序中缓冲db插入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个系统,它使用大约30个线程作为时间将大量的任务分解成小任务。当每个单独的线程完成时,它将计算结果持久存储到数据库中。我想要实现的是让每个线程将其结果传递给一个新的persisance类,该类将在其自己的线程中运行时执行一种双缓冲和数据持久性。

I have a system which breaks a large taks into small tasks using about 30 threads as a time. As each individual thread finishes it persists its calculated results to the database. What I want to achieve is to have each thread pass its results to a new persisance class that will perform a type of double buffering and data persistance while running in its own thread.

例如,在100个线程将其数据移动到缓冲区后,persistance类将持久化类交换缓冲区并将所有100个条目保留到数据库中。这将允许使用预准备语句,从而减少程序和数据库之间的I / O.

For example, after 100 threads have moved their data to the buffer the persistance class then the persistance class swaps the buffers and persists all 100 entries to the database. This would allow utilization of prepared statements and thus cut way down on the I/O between the program and the database.

这种多线程双缓冲是否存在模式或良好示例?

Is there a pattern or good example of this type of multithreading double buffering?

推荐答案

我见过这种模式称为异步数据库写入或后写模式。这是分布式缓存产品(Teracotta,Coherence,GigaSpaces等)支持的典型模式,因为您不希望缓存更新也包括将更改写入底层数据库。

I've seen this pattern referred to as asynchronous database writing or the write behind pattern. It's a typical pattern supported by the distributed cache products (Teracotta, Coherence, GigaSpaces, ...) because you don't want your cache updates to also include writing the change to the underlying database.

此模式的复杂性取决于您对丢失数据库更新的容忍度。由于完成工作和将结果写入数据库之间的延迟,您可能会因错误,电源故障而失去更新......(您可以了解相关信息)。

The complexity of this pattern depends on your tolerance for lost database updates. Because of the delay between completing the work and writing the result to the database, you can lose the updates due to bugs, power failures, ... (you get the picture).

我建议使用某种队列将完成的结果写入数据库,然后在100个批次(使用您的示例)或一段时间后处理它们。使用时间延迟的原因是应对不能被100整除的结果集。

I'd suggest some sort of queue for the completed results to be written to the DB and then process them in batches of 100 (using your example) OR after an amount of time. The reason for also using a time delay is to cope with result sets that aren't divisible by 100.

如果您对弹性/耐久性没有要求,那么您可以在同一个过程中完成所有这些。但是,如果您无法容忍任何损失,那么您可以使用持久性JMS队列替换in-vm队列(更慢但更安全)。

If you have no requirements for resilience/durability, then you can do all this in the same process. If, however, you can't tolerate any loss, then you can replace the in-vm queue with a persistent JMS queue (slower but safer).

这篇关于在多线程程序中缓冲db插入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆