关于在jdbc中使用多线程的教程 [英] Tutorial about Using multi-threading in jdbc
问题描述
我们公司有一个每天运行的批处理应用程序,它主要执行一些与数据库相关的工作,例如从文件中将数据导入数据库表。
Our company has a Batch Application which runs every day, It does some database related jobs mostly, import data into database table from file for example.
有20个+在该应用程序中定义的任务,每个任务可能依赖于其他任务。
应用程序逐个执行任务,整个应用程序在一个线程中运行。
There are 20+ tasks defined in that application, each one may depends on other ones or not. The application execute tasks one by one, the whole application runs in a single thread.
完成所有任务需要3~7个小时。我认为它太长了,所以我想也许我可以通过多线程来提高性能。
It takes 3~7 hours to finish all the tasks. I think it's too long, so I think maybe I can improve performance by multi-threading.
我认为因为任务之间存在依赖关系,所以不好(或者不是简单地使任务并行运行,但也许我可以使用多线程来提高任务内部的性能。
I think as there is dependency between tasks, it not good (or it's not easy) to make tasks run in parallel, but maybe I can use multi-threading to improve performance inside a task.
例如:我们有一个任务定义为ImportBizData ,将数据从数据文件复制到数据库表中(通常包含100,0000多行)。我想知道是否值得使用多线程?
for example : we have a task defined as "ImportBizData", which copy data into a database table from a data file(usually contains 100,0000+ rows). I wonder is that worth to use multi-threading?
由于我对多线程知之甚少,所以希望有人提供一些关于这个主题的教程链接。
As I know a little about multi-threading, I hope some one provide some tutorial links on this topic.
推荐答案
多线程会提高您的表现,但您需要了解以下几点:
Multi-threading will improve your performance but there are a couple of things you need to know:
- 每个线程都需要自己的JDBC连接。线程之间不能共享连接,因为每个连接也是一个事务。
- 以块的形式上传数据,并在一个上传
commit
同时避免累积大量的回滚/撤销表。 - 将任务分成多个工作单位,每个单位完成一项工作。
- Each thread needs its own JDBC connection. Connections can't be shared between threads because each connection is also a transaction.
- Upload the data in chunks and
commit
once in a while to avoid accumulating huge rollback/undo tables. - Cut tasks into several work units where each unit does one job.
详细说明最后一点:目前,您有一个任务可以读取文件,解析它,打开JDBC连接,进行一些计算,将数据发送到数据库等等。
To elaborate the last point: Currently, you have a task that reads a file, parses it, opens a JDBC connection, does some calculations, sends the data to the database, etc.
你应该做什么:
- 一个(!)线程来读取文件并创建工作 出来了。每份工作都应该包含一个小而不是太小的工作单元。将它们推入队列
- 下一个线程等待队列中的作业并进行计算。当步骤#1中的线程等待慢速硬盘返回新的数据行时,可能会发生这种情况。此转换步骤的结果将进入下一个队列。
- 通过JDBC上传数据的一个或多个线程。
第一个和最后一个线程非常慢,因为它们受I / O限制(硬盘速度慢,网络连接更糟)。在数据库中插入数据是一项非常复杂的任务(分配空间,更新索引,检查外键)
The first and the last threads are pretty slow because they are I/O bound (hard disks are slow and network connections are even worse). Plus inserting data in a database is a very complex task (allocating space, updating indexes, checking foreign keys)
使用不同的工作线程可以带来很多好处:
Using different worker threads gives you lots of advantages:
- 分别测试每个线程很容易。由于它们不共享数据,因此无需同步。队列将为您做到这一点
- 您可以快速更改每个步骤的线程数以调整性能
这篇关于关于在jdbc中使用多线程的教程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!