使用预准备语句批量更新在Java中批量插入 [英] Bulk insert in Java using prepared statements batch update

查看:316
本文介绍了使用预准备语句批量更新在Java中批量插入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在Java中用大约50,000行10列
填充resultSet,然后使用 batchExecute 方法将它们插入到另一个表中c $ c> PreparedStatement 。

I am trying to fill a resultSet in Java with about 50,000 rows of 10 columns and then inserting them into another table using the batchExecute method of PreparedStatement.

为了加快这个过程,我做了一些研究,发现在将数据读入resultSet时,fetchSize起着重要作用角色。

To make the process faster I did some research and found that while reading data into resultSet the fetchSize plays an important role.

拥有一个非常低的fetchSize会导致太多的服务器访问,而一个非常高的fetchSize会阻塞网络资源,所以我试验了一下并设置了建立适合我的基础设施的最佳大小。

Having a very low fetchSize can result into too many trips to the server and a very high fetchSize can block the network resources, so I experimented a little bit and set up an optimum size that suits my infrastructure.

我正在读取此resultSet并创建插入语句以插入到另一个数据库的另一个表中。

I am reading this resultSet and creating insert statements to insert into another table of a different database.

像这样的东西(只是一个样本,而不是真正的代码):

Something like this (just a sample, not real code):

for (i=0 ; i<=50000 ; i++) {
    statement.setString(1, "a@a.com");
    statement.setLong(2, 1);
    statement.addBatch();
}
statement.executeBatch();




  • executeBatch方法是否会尝试一次发送所有数据?

  • 有没有办法定义批量大小?

  • 有没有更好的方法来加快批量插入过程?

  • 批量更新(50,000行10列)时,使用可更新的 ResultSet 更好吗?或批处理执行的PreparedStaement?

    While updating in bulk (50,000 rows 10 cols), is it better to use a updatable ResultSet or PreparedStaement with batch execution?

    推荐答案

    我将依次解决您的问题。

    I'll address your questions in turn.


    • executeBatch方法是否会尝试一次发送所有数据?

    • Will the executeBatch method tries to send all the data at once?

    这可能因每个JDBC驱动程序而异,但我研究的少数几个将遍历每个批处理条目,并将每个参数与准备好的语句句柄一起发送到数据库执行。也就是说,在上面的示例中,将使用50,000对参数执行50,000个预准备语句,但这些50,000个步骤可以在较低级别的内循环中完成,这是节省时间的地方。相当拉伸的类比,就像从用户模式退出到内核模式并在那里运行整个执行循环。您可以为每个批次条目节省潜入和退出低级别模式的费用。

    This can vary with each JDBC driver, but the few I've studied will iterate over each batch entry and send the arguments together with the prepared statement handle each time to the database for execution. That is, in your example above, there would 50,000 executions of the prepared statement with 50,000 pairs of arguments, but these 50,000 steps can be done in a lower-level "inner loop," which is where the time savings come in. As a rather stretched analogy, it's like dropping out of "user mode" down into "kernel mode" and running the entire execution loop there. You save the cost of diving in and out of that lower-level mode for each batch entry.


    • 有没有办法定义批量大小

    • Is there a way to define the batch size

    通过在执行批处理之前推送50,000个参数集,您已在此处隐式定义了声明#则ExecuteBatch()。批量大小为1也同样有效。

    You've defined it implicitly here by pushing 50,000 argument sets in before executing the batch via Statement#executeBatch(). A batch size of one is just as valid.


    • 有没有更好的方法来加快批量插入过程?

    • Is there any better way to speed up the process of bulk insertion?

    考虑在批量插入之前显式打开一个事务,然后再提交它。不要让数据库或JDBC驱动程序在批处理中的每个插入步骤周围强加事务边界。您可以使用 Connection#setAutoCommit(boolean) 方法。首先从自动提交模式中取出连接,然后填充批次,启动事务,执行批处理,然后通过 Connection#commit()

    Consider opening a transaction explicitly before the batch insertion, and commit it afterward. Don't let either the database or the JDBC driver impose a transaction boundary around each insertion step in the batch. You can control the JDBC layer with the Connection#setAutoCommit(boolean) method. Take the connection out of auto-commit mode first, then populate your batches, start a transaction, execute the batch, then commit the transaction via Connection#commit().

    此建议假定您的插入不会与并发编写者竞争,并假设这些事务边界将为您提供从源表中读取的足够一致的值用于插入。如果情况并非如此,那么赞成正确性超过速度。

    This advice assumes that your insertions won't be contending with concurrent writers, and assumes that these transaction boundaries will give you sufficiently consistent values read from your source tables for use in the insertions. If that's not the case, favor correctness over speed.


    • 使用可更新的 ResultSet是否更好? PreparedStatement 批量执行?

    • Is it better to use a updatable ResultSet or PreparedStatement with batch execution?

    没有什么能比你选择的JDBC驱动程序更好,但我希望后者— PreparedStatement Statement#executeBatch()将在这里胜出。语句句柄可能有一个关联的列表或批处理参数数组,每个条目都是在调用 Statement#executeBatch()和<$ c之间提供的参数集。 $ c> Statement#addBatch()(或 Statement#clearBatch())。每次调用 addBatch()时,列表都会增长,并且在调用 executeBatch()之前不会刷新。因此, Statement 实例实际上充当了参数缓冲区;你是为了方便而交易内存(使用 Statement 实例代替你自己的外部参数设置缓冲区)。

    Nothing beats testing with your JDBC driver of choice, but I expect the latter—PreparedStatement and Statement#executeBatch() will win out here. The statement handle may have an associated list or array of "batch arguments," with each entry being the argument set provided in between calls to Statement#executeBatch() and Statement#addBatch() (or Statement#clearBatch()). The list will grow with each call to addBatch(), and not be flushed until you call executeBatch(). Hence, the Statement instance is really acting as an argument buffer; you're trading memory for convenience (using the Statement instance in lieu of your own external argument set buffer).

    同样,只要我们不讨论特定的 JDBC驱动程序,您应该认为这些答案是通用的和推测性的。每个驱动程序的复杂程度各不相同,每个驱动程序的优化程度都各不相同。

    Again, you should consider these answers general and speculative so long as we're not discussing a specific JDBC driver. Each driver varies in sophistication, and each will vary in which optimizations it pursues.

    这篇关于使用预准备语句批量更新在Java中批量插入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆