Spring Data JPA:嵌套实体的批量插入 [英] Spring Data JPA: Batch insert for nested entities

查看:107
本文介绍了Spring Data JPA:嵌套实体的批量插入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个测试用例,我需要将 100'000 个实体实例持久化到数据库中.我目前使用的代码就是这样做的,但在所有数据都保存在数据库中之前最多需要 40 秒.数据是从大约 15 MB 大小的 JSON 文件中读取的.

I have a test case where I need to persist 100'000 entity instances into the database. The code I'm currently using does this, but it takes up to 40 seconds until all the data is persisted in the database. The data is read from a JSON file which is about 15 MB in size.

现在我已经在自定义存储库中为另一个项目实现了批量插入方法.然而,在那种情况下,我有很多顶级实体要持久化,只有几个嵌套实体.

Now I had already implemented a batch insert method in a custom repository before for another project. However, in that case I had a lot of top level entities to persist, with only a few nested entities.

在我目前的情况下,我有 5 个 Job 实体,其中包含大约 30 个 JobDetail 实体的列表.一个 JobDetail 包含 850 到 1100 个 JobEnvelope 实体.

In my current case I have 5 Job entities that contain a List of about ~30 JobDetail entities. One JobDetail contains between 850 and 1100 JobEnvelope entities.

写入数据库时​​,我使用默认的 save(Iterable jobs) 接口方法提交 Job 实体列表.所有嵌套实体都具有 CascadeType PERSIST.每个实体都有自己的表.

When writing to the database I commit the List of Job entities with the default save(Iterable<Job> jobs) interface method. All nested entities have the CascadeType PERSIST. Each entity has it's own table.

启用批量插入的常用方法是实现像 saveBatch 这样的自定义方法,每隔一段时间刷新一次.但在这种情况下我的问题是 JobEnvelope 实体.我不使用 JobEnvelope 存储库来持久化它们,而是让 Job 实体的存储库处理它.我使用 MariaDB 作为数据库服务器.

The usual way to enable batch inserts would be to implement a custom method like saveBatch that flushes every once in a while. But my problem in this case are the JobEnvelope entities. I don't persist them with a JobEnvelope repository, instead I let the repository of the Jobentity handle it. I'm using MariaDB as database server.

所以我的问题归结为以下几点:如何让 JobRepository 批量插入它的嵌套实体?

So my question boils down to the following: How can I make the JobRepository insert it's nested entities in batches?

这些是我的 3 个有问题的实体:

These are my 3 entites in question:

@Entity
public class Job {
  @Id
  @GeneratedValue
  private int jobId;

  @OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST, mappedBy = "job")
  @JsonManagedReference
  private Collection<JobDetail> jobDetails;
}

工作详情

@Entity
public class JobDetail {
  @Id
  @GeneratedValue
  private int jobDetailId;

  @ManyToOne(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST)
  @JoinColumn(name = "jobId")
  @JsonBackReference
  private Job job;

  @OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST, mappedBy = "jobDetail")
  @JsonManagedReference
  private List<JobEnvelope> jobEnvelopes;
}

工作信封

@Entity
public class JobEnvelope {
  @Id
  @GeneratedValue
  private int jobEnvelopeId;

  @ManyToOne(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST)
  @JoinColumn(name = "jobDetailId")
  private JobDetail jobDetail;

  private double weight;
}

推荐答案

确保正确配置 Hibernate 批处理相关属性:

Make sure to configure Hibernate batch-related properties properly:

<property name="hibernate.jdbc.batch_size">100</property>
<property name="hibernate.order_inserts">true</property>
<property name="hibernate.order_updates">true</property>

关键是如果连续的语句操作同一个表,它们就可以被批处理.如果有插入到另一个表的语句,则必须在该语句之前中断并执行先前的批处理构建.使用 hibernate.order_inserts 属性,您允许 Hibernate 在构造批处理语句之前重新排序插入(hibernate.order_updates 对更新语句具有相同的效果).

The point is that successive statements can be batched if they manipulate the same table. If there comes the statement doing insert to another table, the previous batch construction must be interrupted and executed before that statement. With the hibernate.order_inserts property you are giving permission to Hibernate to reorder inserts before constructing batch statements (hibernate.order_updates has the same effect for update statements).

jdbc.batch_size 是 Hibernate 将使用的最大批量大小.尝试分析不同的值,然后选择一个在您的用例中表现最佳的值.

jdbc.batch_size is the maximum batch size that Hibernate will use. Try and analyze different values and pick one that shows best performance in your use cases.

注意插入语句的批处理是禁用 如果使用 IDENTITY id 生成器.

Note that batching of insert statements is disabled if IDENTITY id generator is used.

特定于 MySQL,您必须指定 rewriteBatchedStatements=true 作为连接 URL 的一部分.要确保批处理按预期工作,请添加 profileSQL=true 以检查驱动程序发送到数据库的 SQL.更多详情此处.

Specific to MySQL, you have to specify rewriteBatchedStatements=true as part of the connection URL. To make sure that batching is working as expected, add profileSQL=true to inspect the SQL the driver sends to the database. More details here.

如果您的实体是版本化的(出于乐观锁定目的),那么为了利用批量更新(不影响插入),您还必须打开:

If your entities are versioned (for optimistic locking purposes), then in order to utilize batch updates (doesn't impact inserts) you will have to turn on also:

<property name="hibernate.jdbc.batch_versioned_data">true</property>

通过此属性,您可以告诉 Hibernate JDBC 驱动程序能够在执行批量更新(需要执行版本检查)时返回受影响行的正确计数.您必须检查这是否适用于您的数据库/jdbc 驱动程序.例如,它不起作用Oracle 11 及更早的 Oracle 版本.

With this property you tell Hibernate that the JDBC driver is capable to return the correct count of affected rows when executing batch update (needed to perform the version check). You have to check whether this works properly for your database/jdbc driver. For example, it does not work in Oracle 11 and older Oracle versions.

您可能还想刷新和清除持久性上下文 每批之后释放内存,否则所有托管对象都保留在持久化上下文中,直到它关闭.

You may also want to flush and clear the persistence context after each batch to release memory, otherwise all of the managed objects remain in the persistence context until it is closed.

此外,您还可以找到此博客 有用,因为它很好地解释了 Hibernate 批处理机制的细节.

Also, you may find this blog useful as it nicely explains the details of Hibernate batching mechanism.

这篇关于Spring Data JPA:嵌套实体的批量插入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆