spring-data JPA:massivley提高插入性能 [英] spring-data JPA: massivley improve insert performance

查看:256
本文介绍了spring-data JPA:massivley提高插入性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的应用程序中,我需要大幅度提高插入性能。示例:具有约21K记录的文件需要100多分钟才能插入。有些原因可能需要一些时间,如20分钟左右,但超过100分钟太长了。

数据被插入到3个表格中(多对多)。 Id是从一个序列生成的,但我已经使用google搜索并将 hibernate.id.new_generator_mappings = true 和allocationSize +序列增量设置为1000.



此外,数据量并不是什么特别之处,文件为90 MB。



我已经用visual vm验证过大部分时间花在jdbc驱动程序(postgresql)和休眠。我认为这个问题与子表中的唯一约束有关。服务层在插入前进行手动检查(= SELECT)。如果记录已经存在,它会重用它,而不是等待一个约束异常。



因此,为了对特定文件进行总结,每个表将会有1个插入是不同的,但不是这个文件是理想(最快)的情况)。这意味着总共60k插入+ 20k选择。仍然超过100分钟似乎很长(是硬件计数,它是在一个简单的电脑与7200转驱动器,没有ssd或RAID)。然而,与以前的应用程序(普通jdbc)相比,这是一个改进的版本,在该应用程序上,该硬件上的相同插入需要大约15分钟。考虑到在这两种情况下,大约需要花费4-5分钟才能进行预处理,这种增加是巨大的。

任何提示谁可以改进?有没有批量加载功能?

解决方案

请参阅

spring-data JPA:手动提交交易并重新启动一个新账户



每一次添加 entityManager.flush() entityManager.clear()第n次调用save()方法。如果你使用hibernate添加hibernate.jdbc.batch_size并设置它= n。 100似乎是一个合理的选择。



性能提升> 10倍,可能接近100倍。


In my application I need to massively improve insert performance. Example: A file with about 21K records takes over 100 min to insert. There are reasons it can takes some time, like 20 min or so but over 100 min is just too long.

Data is inserted into 3 tables (many-to-many). Id's are generated from a sequence but I have already googled and set hibernate.id.new_generator_mappings = true and allocationSize + sequence increment to 1000.

Also the amount of data is not anything extraordinary at all, the file is 90 mb.

I have verified with visual vm that most of the time is spent in jdbc driver (postgresql) and hibernate. I think the issue is related to a unique constraint in the child table. The service layer makes a manual check (=SELECT) before inserting. If the record already exists, it reuses it instead of waiting for a constraint exception.

So to sum it up for the specific file there will be 1 insert per table (could be different but not for this file which is the ideal (fastest) case). That means total 60k inserts + 20k selects. Still over 100 min seems very long (yeah hardware counts and it is on a simple PC with 7200 rpm drive, no ssd or raid). However this is an improved version over a previous application (plain jdbc) on which the same insert on this hardware took about 15 min. Considering that in both cases about 4-5 min is spent on "pre-processing" the increase is massive.

Any tips who this could be improved? Is there any batch loading functionality?

解决方案

see

spring-data JPA: manual commit transaction and restart new one

Add entityManager.flush() and entityManager.clear() after every n-th call to save() method. If you use hibernate add hibernate.jdbc.batch_size and set it = n. 100 seems a reasonable choice.

Performance increase was > 10x, probably close to 100x.

这篇关于spring-data JPA:massivley提高插入性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆