是否可以进行JDBC多线程插入? [英] Is JDBC multi-threaded insert possible?

查看:119
本文介绍了是否可以进行JDBC多线程插入?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在研究一个Java项目,我需要准备一个很大的(对我而言)mysql数据库.我必须使用Jsoup进行Web抓取,并将结果也存储到我的数据库中.据我估计,我将插入大约1,500,000至2,000,000条记录.在我的第一个试用版中,我只是使用循环插入这些记录,而我花了一个星期的时间才能插入大约1/3的所需记录,我认为这太慢了.是否可以使该过程成为多线程,以便我可以将记录分成3组,即每组500,000条记录,然后将它们插入一个数据库(特别是一个表)?

I'm currently working on a Java project which i need to prepare a big(to me) mysql database. I have to do web scraping using Jsoup and store the results into my database as well. As i estimated, i will have roughly 1,500,000 to 2,000,000 records to be inserted. In my first trial, i just use a loop to insert these records and it takes me one week to insert about 1/3 of my required records, which is too slow i think. Is it possible to make this process multi-threaded, so that i can have my records split into 3 sets, say 500,000 records per set, and then insert them into one database( one table specifically)?

推荐答案

多线程无法为您提供帮助.您只需将竞争瓶颈从您的应用服务器移至数据库.

Multi-threading isn't going to help you here. You'll just move the contention bottleneck from your app server to the database.

相反,请尝试使用批处理插入,它们通常会使这种事情快几个数量级.请参阅JDBC教程中的"3.4进行批处理更新" .

Instead, try using batch-inserts instead, they generally make this sort of thing orders of magnitude faster. See "3.4 Making Batch Updates" in the JDBC tutorial.

编辑:正如@Jon所说,您需要将网页的获取与其从插入数据库中分离出来,否则整个过程将以最慢的速度进行.您可能有多个线程正在获取网页,这些线程将数据添加到队列数据结构中,然后只有一个线程使用批处理插入将队列中的数据排入数据库.

Edit: As @Jon commented, you need to decouple the fetching of the web pages from their insertion into the database, otherwise the whole process will go at the speed of the slowest operation. You could have multiple threads fetching web pages, which add the data to a queue data structure, and then have a single thread draining the queue into the database using a batch insert.

这篇关于是否可以进行JDBC多线程插入?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆