EclipseLink批量插入非常慢 [英] EclipseLink batch insert very very slow

查看:379
本文介绍了EclipseLink批量插入非常慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用EclipseLink优化我的JPA实现。
我已经为它添加了批处理操作。但它仍然花了很多时间做50 000次插入。使用原始SQL和JDBC完成相同的插入所需的时间大约是10倍。

I'm trying to optimize my JPA implementation, using EclipseLink. I've added batch operations to it. But it is still taking A LOT of time to do 50 000 inserts. It takes more than 10 times the amount of time it takes to do the exact same insert using raw SQL with JDBC.

为了确保批处理操作实际上正常工作,我使用Wireshark检查我的数据包并且它没有使用批量插入。

To make sure batch operations were in fact working I used Wireshark to check my packets and it is not using batch inserts.

这是插入数据包之一:

它没有这样做:

INSERT INTO ENTITYCLASSTEST (LASTNAME, NAME) VALUES ('sfirosijfhgdoi 0', 'dsufius0'), ('sfirosijfhgdoi 0', 'dsufius0'), ('sfirosijfhgdoi 0', 'dsufius0'), ('sfirosijfhgdoi 0', 'dsufius0')... and so on

我原以为它会如上所述,但它插入一个每个数据包的行数而不是每个数据包多行。

I was expecting it to do as above but it is inserting one line per packet and not multiple lines per packet.

这是我的实体类:

@Entity
public class EntityClassTest implements Serializable {
    private static final long serialVersionUID = 1L;

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    private String name;
    private String lastname;

    public EntityClassTest() {
    }

    public EntityClassTest(Long id, String name, String lastname) {
        this.id = id;
        this.name = name;
        this.lastname = lastname;
    }

    public EntityClassTest(String name, String lastname) {
        this.name = name;
        this.lastname = lastname;
    }

    public Long getId() {
        return id;
    }

    public String getName() {
        return name;
    }

    public String getLastName() {
        return lastname;
    }

    public void setId(Long id) {
        this.id = id;
    }

    public void setName(String name) {
        this.name = name;
    }

    public void setLastName(String lastname) {
        this.lastname = lastname;
    }

    @Override
    public int hashCode() {
        int hash = 0;
        hash += (id != null ? id.hashCode() : 0);
        return hash;
    }

    @Override
    public boolean equals(Object object) {
        // TODO: Warning - this method won't work in the case the id fields are not set
        if (!(object instanceof EntityClassTest)) {
            return false;
        }
        EntityClassTest other = (EntityClassTest) object;
        if ((this.id == null && other.id != null) || (this.id != null && !this.id.equals(other.id))) {
            return false;
        }
        return true;
    }

    @Override
    public String toString() {
        return "database.EntityClassTest [id=" + id + " ]";
    }

}

这是我的持久方法收到 List 并保留其中的所有对象。

And here is my persist method that receives a List and persists all the objects inside.

public void insertListToTable(final String persistenceUnit, final List list) throws SQLException {
        final EntityManagerFactory entityManagerFactory = Persistence.createEntityManagerFactory(persistenceUnit);
        final EntityManager entityManager = entityManagerFactory.createEntityManager();
        final EntityTransaction transaction = entityManager.getTransaction();

        try {            
            final int listSize = list.size();
            transaction.begin();

            for (int i = 0; i<listSize; i++) { //Object object : list) {
                final Object object = list.get(i);
                entityManager.persist(object);

                if ( i % 500 == 0 ) { //500, same as the JDBC batch size defined in the persistence.xml
                    //flush a batch of inserts and release memory:
                    entityManager.flush();
                    entityManager.clear();
                }
            }
            transaction.commit();
        }
        catch(Exception e) {
            if (transaction != null) {
                transaction.rollback();
            }
            throw new SQLException(e.getMessage());
        } 
        finally {
            entityManager.close();
        }
    }

我的 persistence.xml ,其中我将500设置为批处理值,文件为:

And my persistence.xml, where I set 500 as the batch value, file is:

<?xml version="1.0" encoding="UTF-8"?>
<persistence version="2.1" xmlns="http://xmlns.jcp.org/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/persistence http://xmlns.jcp.org/xml/ns/persistence/persistence_2_1.xsd">
<persistence-unit name="ExternalServer" transaction-type="RESOURCE_LOCAL">
    <provider>org.eclipse.persistence.jpa.PersistenceProvider</provider>
    <!-- List of Entity classes -->
    <class>model.EntityClassTest</class>
    <properties>
      <property name="javax.persistence.jdbc.url" value="jdbc:mysql://myServer:3306/testdb?zeroDateTimeBehavior=convertToNull"/>
      <property name="javax.persistence.jdbc.user" value="testdbuser"/>
      <property name="javax.persistence.jdbc.driver" value="com.mysql.jdbc.Driver"/>
      <property name="javax.persistence.jdbc.password" value="myPassword"/>
      <property name="javax.persistence.schema-generation.database.action" value="create"/>

      <!-- Weaving -->
      <property name="eclipselink.weaving" value="static"/>
      <!-- SQL dialect / Database type -->
      <property name="hibernate.dialect" value="org.hibernate.dialect.MySQLDialect"/>
      <property name="eclipselink.target-database" value="MySQL"/>
      <!-- Tell the JPA provider to, by default, create the table if it does not exist. -->
      <property name="javax.persistence.schema-generation.database.action" value="create"/>
      <!-- No logging (For development change the value to "FINE") -->
      <property name="eclipselink.logging.level" value="OFF"/>
      <!-- Enable batch writing -->
      <property name="eclipselink.jdbc.batch-writing" value="JDBC"/>
      <!-- Batch size -->
      <property name="eclipselink.jdbc.batch-writing.size" value="500"/>

    </properties>
  </persistence-unit>
</persistence>

所以我的问题是,为什么不批量插入?我相信我已经完成了EclipseLink的配置,这是我在EclipseLink网站上阅读的内容,也是这样的。

So my question is, why is it not batch inserting? I believe I've EclipseLink well configured to do so from what I've been reading around on EclipseLink website and here SO as well.

//// ////////////////////////编辑///////////////////////// /

正如Chris的回答所示,我更改了我的 EntityClassTest 此值 @GeneratedValue(strategy = GenerationType.IDENTITY) @GeneratedValue(strategy = GenerationType.SEQUENCE)并重新运行测试和数据包像以前一样被发送(就像我上面发布的图像)。因此我不担心我的问题。

As suggested by Chris's answer, I changed in my EntityClassTest this value @GeneratedValue(strategy = GenerationType.IDENTITY) to @GeneratedValue(strategy = GenerationType.SEQUENCE) and re-run the test and the packets are being sent as before (like the image I posted above). So it didn't fix my problem I'm afraid.

/////////////////// /////////编辑2 ////////////////////////

我已将 persistence.xml 文件中的日志记录级别更改为 FINEST ,如下所示。 / p>

I've changed the logging level in the persistence.xml file to FINEST as shown next.

<property name="eclipselink.logging.level" value="FINEST"/>

这是生成的日志。我将它放在一个pastebin中,因为它很长。

And here is the log generated. I placed it in a pastebin because it is quite long.

http ://pastebin.com/rKihCKMW

似乎是在调用执行查询InsertObjectQuery 很多次。

////////////////////////////编辑3 ////////////////////////

以下是每个版本我正在使用的组件。

Here's the version for each component I'm using.

+-------------------------+------------------------------+
| Variable_name           | Value                        |
+-------------------------+------------------------------+
| innodb_version          | 5.6.12                       |
| protocol_version        | 10                           |
| slave_type_conversions  |                              |
| version                 | 5.6.12-log                   |
| version_comment         | MySQL Community Server (GPL) |
| version_compile_machine | x86_64                       |
| version_compile_os      | Win64                        |
+-------------------------+------------------------------+

Netbeans 8.0

EclipseLink (JPA 2.1)

mysql-connector-java-5.1.24.jar

////////////// //////////////编辑4 ////////////////////////

按照CuriousMind的回答,我将我的 EntityClassTest id注释编辑为:

Following CuriousMind's answer I've edited my EntityClassTest id annotation to:

    @Id
    @GeneratedValue(strategy = GenerationType.AUTO, generator="id-seq-gen")
    @SequenceGenerator( name="id-seq-gen", sequenceName="ID_SEQ_GEN", allocationSize=500 )
    private Long id;

但它没有解决我的问题,我仍然每个数据包只有一个插入(如由上图所述)和 EclipseLink 日志我得到:

But it didn't solve my problem, I'm still getting one single insert per packet (as described by the image above) and on the EclipseLink log I'm getting:

[EL Fine]: sql: 2014-10-19 06:44:02.608--ClientSession(824177287)--Connection(1674390738)--Thread(Thread[main,5,main])--SELECT LAST_INSERT_ID()
[EL Finest]: sequencing: 2014-10-19 06:44:02.608--UnitOfWork(1985011414)--Thread(Thread[main,5,main])--assign sequence to the object (1.251 -> database.EntityClassTest [id=null ])
[EL Finest]: query: 2014-10-19 06:44:02.608--UnitOfWork(1985011414)--Thread(Thread[main,5,main])--Execute query InsertObjectQuery(database.EntityClassTest [id=null ])
[EL Finest]: query: 2014-10-19 06:44:02.608--ClientSession(824177287)--Thread(Thread[main,5,main])--Execute query ValueReadQuery(name="ID_SEQ_GEN" sql="SELECT LAST_INSERT_ID()")
[EL Fine]: sql: 2014-10-19 06:44:02.608--ClientSession(824177287)--Connection(1674390738)--Thread(Thread[main,5,main])--INSERT INTO ENTITYCLASSTEST (LASTNAME, NAME) VALUES (?, ?)
    bind => [sfirosijfhgdoi 2068, dsufius1034]
[EL Fine]: sql: 2014-10-19 06:44:02.608--ClientSession(824177287)--Connection(1674390738)--Thread(Thread[main,5,main])--SELECT LAST_INSERT_ID()
[EL Finest]: sequencing: 2014-10-19 06:44:02.608--UnitOfWork(1985011414)--Thread(Thread[main,5,main])--assign sequence to the object (1.252 -> database.EntityClassTest [id=null ])
[EL Finest]: query: 2014-10-19 06:44:02.608--UnitOfWork(1985011414)--Thread(Thread[main,5,main])--Execute query InsertObjectQuery(database.EntityClassTest [id=null ])
[EL Finest]: query: 2014-10-19 06:44:02.608--ClientSession(824177287)--Thread(Thread[main,5,main])--Execute query ValueReadQuery(name="ID_SEQ_GEN" sql="SELECT LAST_INSERT_ID()")
[EL Fine]: sql: 2014-10-19 06:44:02.608--ClientSession(824177287)--Connection(1674390738)--Thread(Thread[main,5,main])--INSERT INTO ENTITYCLASSTEST (LASTNAME, NAME) VALUES (?, ?)
    bind => [sfirosijfhgdoi 2244, dsufius1122]

依此类推......

And so on...

推荐答案

您正在使用GenerationType.IDENTITY进行排序,这需要逐个从每个插入语句中检索ID。尝试一种允许批量预分配的排序方案,您将看到改进:
http://en.wikibooks.org/wiki/Java_Persistence/Identity_and_Sequencing#Identity_sequencing

You are using GenerationType.IDENTITY for sequencing, which requires retrieving the IDs from each insert statement one by one. Try a sequencing scheme that allows preallocation in batches of 500 and you will see improvements: http://en.wikibooks.org/wiki/Java_Persistence/Identity_and_Sequencing#Identity_sequencing

这篇关于EclipseLink批量插入非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆