Spring Data JPA:嵌套实体的批插入 [英] Spring Data JPA: Batch insert for nested entities
问题描述
我有一个测试用例,需要将100,000个实体实例存入数据库。我目前使用的代码执行此操作,但直到所有数据都保存在数据库中,最多需要40秒。数据是从JSON文件读取的,大小约为15 MB。
现在我已经在另一个项目之前在自定义存储库中实现了批处理插入方法。然而,在这种情况下,我有很多顶级实体要坚持,只有少数嵌套实体。
在我目前的情况下,我有5 Job
包含约30 JobDetail
实体列表的实体。一个 JobDetail
包含850到1100 JobEnvelope
个实体。
在写入数据库时,我使用默认的保存(Iterable< Job>作业)
提交作业
接口方法。所有嵌套实体都具有CascadeType PERSIST
。每个实体都有它自己的表。
启用批量插入的常用方法是实现一个自定义方法,如 saveBatch
JobEnvelope 实体。我不坚持使用 JobEnvelope
存储库,而是让
所以我的问题归结为以下几点:如何使 JobRepository
插入它的嵌套实体批量?
这些是我的3个实体:
h2>
@Entity
public class Job {
@Id
@GeneratedValue
private int的jobId;
@OneToMany(fetch = FetchType.EAGER,cascade = CascadeType.PERSIST,mappedBy =job)
@JsonManagedReference
私人收藏< JobDetail> jobDetails;
$ / code>
@Entity
public class Job {
@Id
@GeneratedValue
private int的jobId;
@OneToMany(fetch = FetchType.EAGER,cascade = CascadeType.PERSIST,mappedBy =job)
@JsonManagedReference
私人收藏< JobDetail> jobDetails;
$ / code>
JobDetail
@Entity
public class JobDetail {
@Id
@GeneratedValue
private int jobDetailId;
@ManyToOne(fetch = FetchType.EAGER,cascade = CascadeType.PERSIST)
@JoinColumn(name =jobId)
@JsonBackReference
私人作业作业;
@OneToMany(fetch = FetchType.EAGER,cascade = CascadeType.PERSIST,mappedBy =jobDetail)
@JsonManagedReference
private List< JobEnvelope> jobEnvelopes;
$ b $ h $ JobEnvelope
pre $@Entity
public class JobEnvelope {
@Id
@GeneratedValue
private int jobEnvelopeId;
@ManyToOne(fetch = FetchType.EAGER,cascade = CascadeType.PERSIST)
@JoinColumn(name =jobDetailId)
private JobDetail jobDetail;
私人双重重量;
确保配置Hibernate批处理相关属性:
< property name =hibernate.jdbc.batch_size> 100< / property>
< property name =hibernate.order_inserts> true< / property>
< property name =hibernate.order_updates> true< / property>
关键在于如果连续的语句操作同一个表,它们可以被批量处理。如果出现插入到另一个表的语句,则必须在该语句之前中断和执行先前的批处理构造。通过 hibernate.order_inserts
属性,您可以授权Hibernate在构造批处理语句之前对插入进行重新排序( hibernate.order_updates
has对于更新语句也是如此)。
jdbc.batch_size
是Hibernate将使用的最大批处理大小。尝试并分析不同的值,然后选择一个在您的用例中显示最佳性能的值。
IDENTITY
id生成器被使用。 特定于MySQL,您必须指定 rewriteBatchedStatements = true
作为连接URL的一部分。要确保批处理按预期工作,请添加 profileSQL = true
来检查驱动程序发送到数据库的SQL。更多细节此处。
如果您的实体版本为(for乐观锁定的目的),然后为了利用批量更新(不影响插入),您还必须打开:
< property name =hibernate.jdbc.batch_versioned_data> true< / property>
通过这个属性,您可以告诉Hibernate,当驱动程序能够返回受影响的行的正确计数时执行批量更新(需要执行版本检查)。您必须检查它是否适用于您的数据库/ jdbc驱动程序。例如,它不起作用 Oracle 11和更早版本的Oracle版本。
您可能还想刷新并清除持久性上下文每个批次后释放内存,否则所有托管对象都会保留在持久性上下文中,直到它关闭。
另外,您可能会发现这个博客很有用,因为它很好地解释了Hibernate批处理机制的细节。
I have a test case where I need to persist 100'000 entity instances into the database. The code I'm currently using does this, but it takes up to 40 seconds until all the data is persisted in the database. The data is read from a JSON file which is about 15 MB in size.
Now I had already implemented a batch insert method in a custom repository before for another project. However, in that case I had a lot of top level entities to persist, with only a few nested entities.
In my current case I have 5 Job
entities that contain a List of about ~30 JobDetail
entities. One JobDetail
contains between 850 and 1100 JobEnvelope
entities.
When writing to the database I commit the List of Job
entities with the default save(Iterable<Job> jobs)
interface method. All nested entities have the CascadeType PERSIST
. Each entity has it's own table.
The usual way to enable batch inserts would be to implement a custom method like saveBatch
that flushes every once in a while. But my problem in this case are the JobEnvelope
entities. I don't persist them with a JobEnvelope
repository, instead I let the repository of the Job
entity handle it. I'm using MariaDB as database server.
So my question boils down to the following: How can I make the JobRepository
insert it's nested entities in batches?
These are my 3 entites in question:
Job
@Entity
public class Job {
@Id
@GeneratedValue
private int jobId;
@OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST, mappedBy = "job")
@JsonManagedReference
private Collection<JobDetail> jobDetails;
}
JobDetail
@Entity
public class JobDetail {
@Id
@GeneratedValue
private int jobDetailId;
@ManyToOne(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST)
@JoinColumn(name = "jobId")
@JsonBackReference
private Job job;
@OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST, mappedBy = "jobDetail")
@JsonManagedReference
private List<JobEnvelope> jobEnvelopes;
}
JobEnvelope
@Entity
public class JobEnvelope {
@Id
@GeneratedValue
private int jobEnvelopeId;
@ManyToOne(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST)
@JoinColumn(name = "jobDetailId")
private JobDetail jobDetail;
private double weight;
}
Make sure to configure Hibernate batch-related properties properly:
<property name="hibernate.jdbc.batch_size">100</property>
<property name="hibernate.order_inserts">true</property>
<property name="hibernate.order_updates">true</property>
The point is that successive statements can be batched if they manipulate the same table. If there comes the statement doing insert to another table, the previous batch construction must be interrupted and executed before that statement. With the hibernate.order_inserts
property you are giving permission to Hibernate to reorder inserts before constructing batch statements (hibernate.order_updates
has the same effect for update statements).
jdbc.batch_size
is the maximum batch size that Hibernate will use. Try and analyze different values and pick one that shows best performance in your use cases.
Note that batching of insert statements is disabled if IDENTITY
id generator is used.
Specific to MySQL, you have to specify rewriteBatchedStatements=true
as part of the connection URL. To make sure that batching is working as expected, add profileSQL=true
to inspect the SQL the driver sends to the database. More details here.
If your entities are versioned (for optimistic locking purposes), then in order to utilize batch updates (doesn't impact inserts) you will have to turn on also:
<property name="hibernate.jdbc.batch_versioned_data">true</property>
With this property you tell Hibernate that the JDBC driver is capable to return the correct count of affected rows when executing batch update (needed to perform the version check). You have to check whether this works properly for your database/jdbc driver. For example, it does not work in Oracle 11 and older Oracle versions.
You may also want to flush and clear the persistence context after each batch to release memory, otherwise all of the managed objects remain in the persistence context until it is closed.
Also, you may find this blog useful as it nicely explains the details of Hibernate batching mechanism.
这篇关于Spring Data JPA:嵌套实体的批插入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!