无论如何,我无法在 Hibernate 中批处理 MySQL INSERT 语句 [英] No matter what, I can't batch MySQL INSERT statements in Hibernate

查看:23
本文介绍了无论如何,我无法在 Hibernate 中批处理 MySQL INSERT 语句的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前面临着众所周知且常见的 Hibernate 插入批处理问题.

I'm currently facing the well-known and common Hibernate insert batch problem.

我需要保存 500 万行的批次.我首先尝试使用更轻的有效载荷.由于我只需要插入 2 种类型的实体(首先是 A 类型的所有记录,然后是 B 类型的所有记录,都指向公共类型 C ManyToOne 父),我想利用最大的优势来自 JDBC 批量插入.

I need to save batches 5 millions of rows long. I'm first trying with a much lighter payload. Since I have to insert entities of only 2 types (first all records of type A, then all records of type B, all pointing to common type C ManyToOne parent), I would like to take the most advantage from JDBC batch insert.

我已经阅读了大量文档,但没有尝试过.

I have already read lots of documentation, but none that I have tried worked.

  • 我知道为了使用批量插入,我不能使用实体生成器.所以我删除了 AUTO_INCREMENT ID,并用一个技巧设置 ID:SELECT MAX(ID) FROM ENTITIES 并每次递增.
  • 我知道我必须定期刷新会话.我会提前发布代码,但无论如何我每 500 个元素执行一次交易.
  • 我知道我必须将 hibernate.jdbc.batch_size 设置为与我的应用程序的批量大小一致,所以我在 LocalSessionFactoryBean 中设置了它(Spring ORM 集成)
  • 我知道我必须在连接 URL 中启用重写批处理语句.
  • I know that in order to use batch inserts I must not use an entity generator. So I removed the AUTO_INCREMENT ID and I'm setting the ID with a trick: SELECT MAX(ID) FROM ENTITIES and increment every time.
  • I know that I must flush the session regularly. I'll post code ahead, but anyway I perform a transaction every 500 elements.
  • I know that I have to set hibernate.jdbc.batch_size consistent with my application's bulk size, so I set it in the LocalSessionFactoryBean (Spring ORM integration)
  • I know I have to enable rewriting batched statements in connection URL.

这是我的实体

公共父实体.这首先插入到单个事务中.我不关心这里的自动增量列.每个批处理作业只有一个记录

Common parent entity. This gets inserted first in a single transaction. I don't care about auto increment column here. Only one record per batch job

@Entity
@Table(...)
@SequenceGenerator(...)
public class Deal
{

    @Id
    @Column(
            name = "DEAL_ID",
            nullable = false)
    @GeneratedValue(
            strategy = GenerationType.AUTO)
    protected Long id;

    ................
}

其中一个孩子(假设每批 250 万条记录)

One of the children (let's say 2.5M records per batch)

@Entity
@Table(
        name = "TA_LOANS")
public class Loan
{

    @Id
    @Column(
            name = "LOAN_ID",
            nullable = false)
    protected Long id;

    @ManyToOne(
            optional = false,
            targetEntity = Deal.class,
            fetch = FetchType.LAZY)
    @JoinColumn(
            name = "DEAL_ID",
            nullable = false)
    protected Deal deal;


    .............
}

其他孩子类型.假设其他 250 万条记录

The other children type. Let's say the other 2.5M records

@Entity
@Table(
        name = "TA_BONDS")
public class Bond
{

    @Id
    @Column(
            name = "BOND_ID")

    @ManyToOne(
            fetch = FetchType.LAZY,
            optional = false,
            targetEntity = Deal.class)
    @JoinColumn(
            name = "DEAL_ID",
            nullable = false,
            updatable = false)
    protected Deal deal;

}

插入记录的简化代码

    long loanIdCounter = loanDao.getMaxId(), bondIdCounter = bondDao.getMaxId(); //Perform SELECT MAX(ID)

    Deal deal = null;

    List<Bond> bondList = new ArrayList<Bond>(COMMIT_BATCH_SIZE); //500 constant value
    List<Loan> loanList = new ArrayList<Loan>(COMMIT_BATCH_SIZE);

    for (String msg: inputStreamReader)
    {
        log.debug(msg.toString());

        if (this is a deal)
        {
            Deal deal = parseDeal(msg.getMessage());

            deal = dealManager.persist(holder.deal); //Called in a separate transaction using Spring annotation @Transaction(REQUIRES_NEW)

        }
        else if (this is a loan)
        {

            Loan loan = parseLoan(msg.getMessage());
            loan.setId(++loanIdCounter);
            loan.setDeal(deal);

            loanList.add(loan);

            if (loanList.size() == COMMIT_BATCH_SIZE)
            {
                loanManager.bulkInsert(loanList); //Perform a bulk insert in a single transaction, not annotated but handled manually this time
                loanList.clear();
            }
        }
        else if (this is a bond)
        {
            Bond bond = parseBond(msg.getMessage());
            bond.setId(++bondIdCounter);
            bond.setDeal(deal);

            bondList.add(bond);



            if (bondList.size() == COMMIT_BATCH_SIZE) //As above
            {
                bondManager.bulkInsert(bondList);
                bondList.clear();

            }
        }
    }

    if (!bondList.isEmpty())
        bondManager.bulkInsert(bondList);
    if (!loanList.isEmpty())
        loanManager.bulkInsert(loanList);
    //Flush remaining items, not important

bulkInsert 的实现:

@Override
public void bulkInsert(Collection<Bond> bonds)
{
    // StatelessSession session = sessionFactory.openStatelessSession();
    Session session = sessionFactory.openSession();
    try
    {
        Transaction t = session.beginTransaction();
        try
        {
            for (Bond bond : bonds)
                // session.persist(bond);
                // session.insert(bond);
                session.save(bond);
        }
        catch (RuntimeException ex)
        {
            t.rollback();
        }
        finally
        {
            t.commit();
        }
    }
    finally
    {
        session.close();
    }

}

正如您从评论中看到的,我尝试了几种有状态/无状态 session 的组合.都没有用.

As you can see from comments, I have tried several combinations of stateful/stateless session. None worked.

我的 dataSource 是一个 ComboPooledDataSource 带有以下 URL

My dataSource is a ComboPooledDataSource with following URL

<b:property name="jdbcUrl" value="jdbc:mysql://server:3306/db?autoReconnect=true&amp;rewriteBatchedStatements=true" />

我的SessionFactory

<b:bean id="sessionFactory" class="class.that.extends.org.springframework.orm.hibernate3.LocalSessionFactoryBean" lazy-init="false" depends-on="dataSource">
        <b:property name="dataSource" ref="phoenixDataSource" />
        <b:property name="hibernateProperties">
            <b:props>
                <b:prop key="hibernate.dialect">${hibernate.dialect}</b:prop> <!-- MySQL5InnoDb-->
                <b:prop key="hibernate.show_sql">${hibernate.showSQL}</b:prop>
                <b:prop key="hibernate.jdbc.batch_size">500</b:prop>
                <b:prop key="hibernate.jdbc.use_scrollable_resultset">false</b:prop>
                <b:prop key="hibernate.cache.use_second_level_cache">false</b:prop>
                <b:prop key="hibernate.cache.provider_class">org.hibernate.cache.EhCacheProvider</b:prop>
                <b:prop key="hibernate.cache.use_query_cache">false</b:prop>
                <b:prop key="hibernate.validator.apply_to_ddl">false</b:prop>
                <b:prop key="hibernate.validator.autoregister_listeners">false</b:prop>
                <b:prop key="hibernate.order_inserts">true</b:prop>
                <b:prop key="hibernate.order_updates">true</b:prop>
            </b:props>
        </b:property>
</b:bean>

即使我的项目范围的类扩展了LocalSessionFactoryBean,它也不会覆盖它的方法(只添加几个项目范围的方法)

Even if my project-wide class extends LocalSessionFactoryBean, it does not override its methods (only adds few project-wide methods)

这几天我很生气.我读了几篇文章,但没有一篇文章帮助我启用批量插入.我从带有 Spring 上下文的 JUnit 测试中运行我的所有代码(因此我可以@Autowire 我的类).我所有的尝试只产生了很多单独的 INSERT 语句

I'm getting mad since a few days. I read a few articles and none helped me enable batch inserts. I run all of my code from JUnit tests instrumented with Spring context (so I can @Autowire my classes). All of my attempts only produce a lots of separate INSERT statements

我错过了什么?

推荐答案

很可能您的查询正在被重写,但您不会通过查看 Hibernate SQL 日志知道是否.Hibernate 不会重写插入语句 - MySQL 驱动程序会重写它们.换句话说,Hibernate 会向驱动程序发送多个插入语句,然后驱动程序将重写它们.因此,Hibernate 日志只显示 Hibernate 发送给驱动程序的 SQL,而不显示驱动程序发送给数据库的 SQL.

It's likely your queries are being rewritten but you wouldn't know if by looking at the Hibernate SQL logs. Hibernate does not rewrite the insert statements - the MySQL driver rewrites them. In other words, Hibernate will send multiple insert statements to the driver, and then the driver will rewrite them. So the Hibernate logs only show you what SQL Hibernate sent to the driver, not what SQL the driver sent to the database.

您可以通过在连接 url 中启用 MySQL 的 profileSQL 参数来验证这一点:

You can verify this by enabling MySQL's profileSQL parameter in connection url:

<b:property name="jdbcUrl" value="jdbc:mysql://server:3306/db?autoReconnect=true&amp;rewriteBatchedStatements=true&amp;profileSQL=true" />

使用与您相似的示例,这就是我的输出:

Using an example similar to yours, this is what my output looks like:

insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
Wed Feb 05 13:29:52 MST 2014 INFO: Profiler Event: [QUERY]  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) duration: 1 ms, connection-id: 81, statement-id: 33, resultset-id: 0, message: insert into Person (firstName, lastName, id) values ('person1', 'Name', 1),('person2', 'Name', 2),('person3', 'Name', 3),('person4', 'Name', 4),('person5', 'Name', 5),('person6', 'Name', 6),('person7', 'Name', 7),('person8', 'Name', 8),('person9', 'Name', 9),('person10', 'Name', 10)

前 10 行由 Hibernate 记录,尽管这不是实际发送到 MySQL 数据库的内容.最后一行来自 MySQL 驱动程序,它清楚地显示了具有多个值的单个批量插入,这就是实际发送到 MySQL 数据库的内容.

The first 10 lines are being logged by Hibernate though this not what is actually being sent to MySQL database. The last line is coming from MySQL driver and it clearly shows a single batch insert with multiple values and that is what is actually being sent to the MySQL database.

这篇关于无论如何,我无法在 Hibernate 中批处理 MySQL INSERT 语句的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆