需要在5秒钟内使用休眠在mysql中插入100000行 [英] Need to insert 100000 rows in mysql using hibernate in under 5 seconds

查看:95
本文介绍了需要在5秒钟内使用休眠在mysql中插入100000行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在5秒钟内使用Hibernate(JPA)在MYSQL表中插入100,000行.我已经尝试了冬眠提供的所有技巧,但仍然不能超过35秒.

I am trying to insert 100,000 rows in a MYSQL table under 5 seconds using Hibernate(JPA). I have tried every trick hibernate offers and still can not do better than 35 seconds.

第一次优化:我从IDENTITY序列生成器开始,这导致60秒的插入时间.后来我放弃了序列生成器,开始自己读取MAX(id)并使用AtomicInteger.incrementAndGet()自己分配字段,从而自己分配了@Id字段.这样将插入时间减少到35秒.

1st optimisation : I started with IDENTITY sequence generator which was resulting in 60 seconds to insert. I later abandoned the sequence generator and started assigning the @Id field myself by reading the MAX(id) and using AtomicInteger.incrementAndGet() to assign fields myself. That reduced the insert time to 35 seconds.

第二次优化:我通过添加

2nd optimisation : I enabled batch inserts, by adding

<prop key="hibernate.jdbc.batch_size">30</prop> <prop key="hibernate.order_inserts">true</prop> <prop key="hibernate.current_session_context_class">thread</prop> <prop key="hibernate.jdbc.batch_versioned_data">true</prop>

<prop key="hibernate.jdbc.batch_size">30</prop> <prop key="hibernate.order_inserts">true</prop> <prop key="hibernate.current_session_context_class">thread</prop> <prop key="hibernate.jdbc.batch_versioned_data">true</prop>

进行配置.令我震惊的是,批量插入对于减少插入时间没有任何作用.仍然是35秒!

to the configuration. I was shocked to find that batch inserts did absolutely nothing to decrease insert time. It was still 35 seconds!

现在,我正在考虑尝试使用多个线程进行插入. 有人有指针吗?我应该选择MongoDB吗?

Now, I am thinking about trying to insert using multiple threads. Anyone has any pointers? Should I have chosen MongoDB?

以下是我的配置: 1.休眠配置 `

Below is my configuration: 1. Hibernate configuration `

<bean id="entityManagerFactoryBean" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean">
        <property name="dataSource" ref="dataSource" />
        <property name="packagesToScan" value="com.progresssoft.manishkr" />
        <property name="jpaVendorAdapter">
            <bean class="org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter" />
        </property>
        <property name="jpaProperties">
            <props>
                <prop key="hibernate.hbm2ddl.auto">${hibernate.hbm2ddl.auto}</prop>
                <prop key="hibernate.dialect">${hibernate.dialect}</prop>
                <prop key="hibernate.show_sql">${hibernate.show_sql}</prop>
                <prop key="hibernate.format_sql">${hibernate.format_sql}</prop>
                <prop key="hibernate.jdbc.batch_size">30</prop>
                <prop key="hibernate.order_inserts">true</prop>
                <prop key="hibernate.current_session_context_class">thread</prop>
                <prop key="hibernate.jdbc.batch_versioned_data">true</prop>
            </props>
        </property>
    </bean>

    <bean class="org.springframework.jdbc.datasource.DriverManagerDataSource"
          id="dataSource">
        <property name="driverClassName" value="${database.driver}"></property>
        <property name="url" value="${database.url}"></property>
        <property name="username" value="${database.username}"></property>
        <property name="password" value="${database.password}"></property>
    </bean>

    <bean id="transactionManager" class="org.springframework.orm.jpa.JpaTransactionManager">
        <property name="entityManagerFactory" ref="entityManagerFactoryBean" />
    </bean>



    <tx:annotation-driven transaction-manager="transactionManager" />

`

  1. 实体配置:

`

@Entity
@Table(name = "myEntity")
public class MyEntity {

    @Id
    private Integer id;

    @Column(name = "deal_id")
    private String dealId;

    ....
    ....

    @Temporal(TemporalType.TIMESTAMP)
    @Column(name = "timestamp")
    private Date timestamp;

    @Column(name = "amount")
    private BigDecimal amount;

    @OneToOne(cascade = CascadeType.ALL)
    @JoinColumn(name = "source_file")
    private MyFile sourceFile;

    public Deal(Integer id,String dealId, ....., Timestamp timestamp, BigDecimal amount, SourceFile sourceFile) {
        this.id = id;
        this.dealId = dealId;
        ...
        ...
        ...
        this.amount = amount;
        this.sourceFile = sourceFile;
    }


    public String getDealId() {
        return dealId;
    }

    public void setDealId(String dealId) {
        this.dealId = dealId;
    }

   ...

   ...


    ....

    public BigDecimal getAmount() {
        return amount;
    }

    public void setAmount(BigDecimal amount) {
        this.amount = amount;
    }

    ....


    public Integer getId() {
        return id;
    }

    public void setId(Integer id) {
        this.id = id;
    }

`

  1. 永久代码(服务):

`

@Service
@Transactional
public class ServiceImpl implements MyService{

    @Autowired
    private MyDao dao;
....

`void foo(){
        for(MyObject d : listOfObjects_100000){
            dao.persist(d);
        }
}

` 4.道课:

`

@Repository
public class DaoImpl implements MyDao{

    @PersistenceContext
    private EntityManager em;

    public void persist(Deal deal){
        em.persist(deal);
    }
}

`

日志: `

DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:32.906 [http-nio-8080-exec-2] 

... ...

DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.batch.internal.BatchingBatch - Executing batch size: 27
18:26:34.011 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - update deal_source_file set invalid_rows=?, source_file=?, valid_rows=? where id=?
18:26:34.015 [http-nio-8080-exec-2] DEBUG o.h.e.j.batch.internal.BatchingBatch - Executing batch size: 1
18:26:34.018 [http-nio-8080-exec-2] DEBUG o.h.e.t.i.jdbc.JdbcTransaction - committed JDBC Connection
18:26:34.018 [http-nio-8080-exec-2] DEBUG o.h.e.t.i.jdbc.JdbcTransaction - re-enabling autocommit
18:26:34.032 [http-nio-8080-exec-2] DEBUG o.s.orm.jpa.JpaTransactionManager - Closing JPA EntityManager [org.hibernate.jpa.internal.EntityManagerImpl@2354fb09] after transaction
18:26:34.032 [http-nio-8080-exec-2] DEBUG o.s.o.jpa.EntityManagerFactoryUtils - Closing JPA EntityManager
18:26:34.032 [http-nio-8080-exec-2] DEBUG o.h.e.j.internal.JdbcCoordinatorImpl - HHH000420: Closing un-released batch
18:26:34.032 [http-nio-8080-exec-2] DEBUG o.h.e.j.i.LogicalConnectionImpl - Releasing JDBC connection
18:26:34.033 [http-nio-8080-exec-2] DEBUG o.h.e.j.i.LogicalConnectionImpl - Released JDBC connection

'

推荐答案

尝试了所有可能的解决方案之后,我终于找到了一种在5秒内插入100,000行的解决方案!

After trying all possible solutions I finally found a solution to insert 100,000 rows under 5 seconds!

我尝试过的事情:

1)使用AtomicInteger用自己生成的ID替换了休眠/数据库的AUTOINCREMENT/GENERATED ID.

1) Replaced hibernate/database's AUTOINCREMENT/GENERATED id's by self generated ID's using AtomicInteger

2)启用batch_size = 50的batch_inserts

2) Enabling batch_inserts with batch_size=50

3)在每"batch_size"个persist()调用之后刷新缓存

3) Flushing cache after every 'batch_size' number of persist() calls

4)多线程(未尝试过此操作)

4) multithreading (did not attempt this one)

最后,有效的方法是使用本地多插入查询,并在一个sql插入查询中插入1000行,而不是在每个实体上使用 persist().为了插入100,000个实体,我创建了一个本地查询,例如"INSERT into MyTable VALUES (x,x,x),(x,x,x).......(x,x,x)" [在一个sql插入查询中插入1000行]

Finally what worked was using a native multi-insert query and inserting 1000 rows in one sql insert query instead of using persist() on every entity. For inserting 100,000 entities, I create a native query like this "INSERT into MyTable VALUES (x,x,x),(x,x,x).......(x,x,x)" [1000 row inserts in one sql insert query]

现在,插入100,000条记录大约需要3秒钟!因此瓶颈是orm本身!对于批量插入,似乎唯一起作用的是本机插入查询!

Now it takes around 3 seconds for inserting 100,000 records! So the bottleneck was the orm itself! For bulk inserts, the only thing that seems to work is native insert queries!

这篇关于需要在5秒钟内使用休眠在mysql中插入100000行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆