Spring Data JPA:嵌套实体的批插入 [英] Spring Data JPA: Batch insert for nested entities

查看:174
本文介绍了Spring Data JPA:嵌套实体的批插入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个测试用例,需要将100,000个实体实例存入数据库。我目前使用的代码执行此操作,但直到所有数据都保存在数据库中,最多需要40秒。数据是从JSON文件读取的,大小约为15 MB。

现在我已经在另一个项目之前在自定义存储库中实现了批处理插入方法。然而,在这种情况下,我有很多顶级实体要坚持,只有少数嵌套实体。



在我目前的情况下,我有5 Job 包含约30 JobDetail 实体列表的实体。一个 JobDetail 包含850到1100 JobEnvelope 个实体。



在写入数据库时​​,我使用默认的保存(Iterable< Job>作业)提交作业接口方法。所有嵌套实体都具有CascadeType PERSIST 。每个实体都有它自己的表。



启用批量插入的常用方法是实现一个自定义方法,如 saveBatch JobEnvelope 实体。我不坚持使用 JobEnvelope 存储库,而是让 Job 实体的存储库处理它。我使用MariaDB作为数据库服务器。



所以我的问题归结为以下几点:如何使 JobRepository 插入它的嵌套实体批量?



这些是我的3个实体:



h2>

  @Entity 
public class Job {
@Id
@GeneratedValue
private int的jobId;

@OneToMany(fetch = FetchType.EAGER,cascade = CascadeType.PERSIST,mappedBy =job)
@JsonManagedReference
私人收藏< JobDetail> jobDetails;

$ / code>



JobDetail



 @Entity 
public class JobDetail {
@Id
@GeneratedValue
private int jobDetailId;

@ManyToOne(fetch = FetchType.EAGER,cascade = CascadeType.PERSIST)
@JoinColumn(name =jobId)
@JsonBackReference
私人作业作业;

@OneToMany(fetch = FetchType.EAGER,cascade = CascadeType.PERSIST,mappedBy =jobDetail)
@JsonManagedReference
private List< JobEnvelope> jobEnvelopes;



$ b $ h $ JobEnvelope

pre $ @Entity
public class JobEnvelope {
@Id
@GeneratedValue
private int jobEnvelopeId;

@ManyToOne(fetch = FetchType.EAGER,cascade = CascadeType.PERSIST)
@JoinColumn(name =jobDetailId)
private JobDetail jobDetail;

私人双重重量;


解决方案

确保配置Hibernate批处理相关属性:

 < property name =hibernate.jdbc.batch_size> 100< / property> 
< property name =hibernate.order_inserts> true< / property>
< property name =hibernate.order_updates> true< / property>

关键在于如果连续的语句操作同一个表,它们可以被批量处理。如果出现插入到另一个表的语句,则必须在该语句之前中断和执行先前的批处理构造。通过 hibernate.order_inserts 属性,您可以授权Hibernate在构造批处理语句之前对插入进行重新排序( hibernate.order_updates has对于更新语句也是如此)。

jdbc.batch_size 是Hibernate将使用的最大批处理大小。尝试并分析不同的值,然后选择一个在您的用例中显示最佳性能的值。

注意,插入语句的批处理是 disabled if IDENTITY id生成器被使用。

特定于MySQL,您必须指定 rewriteBatchedStatements = true 作为连接URL的一部分。要确保批处理按预期工作,请添加 profileSQL = true 来检查驱动程序发送到数据库的SQL。更多细节此处



如果您的实体版本为(for乐观锁定的目的),然后为了利用批量更新(不影响插入),您还必须打开:

 < property name =hibernate.jdbc.batch_versioned_data> true< / property> 

通过这个属性,您可以告诉Hibernate,当驱动程序能够返回受影响的行的正确计数时执行批量更新(需要执行版本检查)。您必须检查它是否适用于您的数据库/ jdbc驱动程序。例如,它不起作用 Oracle 11和更早版本的Oracle版本。



您可能还想刷新并清除持久性上下文每个批次后释放内存,否则所有托管对象都会保留在持久性上下文中,直到它关闭。



另外,您可能会发现这个博客很有用,因为它很好地解释了Hibernate批处理机制的细节。


I have a test case where I need to persist 100'000 entity instances into the database. The code I'm currently using does this, but it takes up to 40 seconds until all the data is persisted in the database. The data is read from a JSON file which is about 15 MB in size.

Now I had already implemented a batch insert method in a custom repository before for another project. However, in that case I had a lot of top level entities to persist, with only a few nested entities.

In my current case I have 5 Job entities that contain a List of about ~30 JobDetail entities. One JobDetail contains between 850 and 1100 JobEnvelope entities.

When writing to the database I commit the List of Job entities with the default save(Iterable<Job> jobs) interface method. All nested entities have the CascadeType PERSIST. Each entity has it's own table.

The usual way to enable batch inserts would be to implement a custom method like saveBatch that flushes every once in a while. But my problem in this case are the JobEnvelope entities. I don't persist them with a JobEnvelope repository, instead I let the repository of the Jobentity handle it. I'm using MariaDB as database server.

So my question boils down to the following: How can I make the JobRepository insert it's nested entities in batches?

These are my 3 entites in question:

Job

@Entity
public class Job {
  @Id
  @GeneratedValue
  private int jobId;

  @OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST, mappedBy = "job")
  @JsonManagedReference
  private Collection<JobDetail> jobDetails;
}

JobDetail

@Entity
public class JobDetail {
  @Id
  @GeneratedValue
  private int jobDetailId;

  @ManyToOne(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST)
  @JoinColumn(name = "jobId")
  @JsonBackReference
  private Job job;

  @OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST, mappedBy = "jobDetail")
  @JsonManagedReference
  private List<JobEnvelope> jobEnvelopes;
}

JobEnvelope

@Entity
public class JobEnvelope {
  @Id
  @GeneratedValue
  private int jobEnvelopeId;

  @ManyToOne(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST)
  @JoinColumn(name = "jobDetailId")
  private JobDetail jobDetail;

  private double weight;
}

解决方案

Make sure to configure Hibernate batch-related properties properly:

<property name="hibernate.jdbc.batch_size">100</property>
<property name="hibernate.order_inserts">true</property>
<property name="hibernate.order_updates">true</property>

The point is that successive statements can be batched if they manipulate the same table. If there comes the statement doing insert to another table, the previous batch construction must be interrupted and executed before that statement. With the hibernate.order_inserts property you are giving permission to Hibernate to reorder inserts before constructing batch statements (hibernate.order_updates has the same effect for update statements).

jdbc.batch_size is the maximum batch size that Hibernate will use. Try and analyze different values and pick one that shows best performance in your use cases.

Note that batching of insert statements is disabled if IDENTITY id generator is used.

Specific to MySQL, you have to specify rewriteBatchedStatements=true as part of the connection URL. To make sure that batching is working as expected, add profileSQL=true to inspect the SQL the driver sends to the database. More details here.

If your entities are versioned (for optimistic locking purposes), then in order to utilize batch updates (doesn't impact inserts) you will have to turn on also:

<property name="hibernate.jdbc.batch_versioned_data">true</property>

With this property you tell Hibernate that the JDBC driver is capable to return the correct count of affected rows when executing batch update (needed to perform the version check). You have to check whether this works properly for your database/jdbc driver. For example, it does not work in Oracle 11 and older Oracle versions.

You may also want to flush and clear the persistence context after each batch to release memory, otherwise all of the managed objects remain in the persistence context until it is closed.

Also, you may find this blog useful as it nicely explains the details of Hibernate batching mechanism.

这篇关于Spring Data JPA:嵌套实体的批插入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆