Spring Data JPA-并发批量插入/更新 [英] Spring Data JPA - concurrent Bulk inserts/updates

查看:1483
本文介绍了Spring Data JPA-并发批量插入/更新的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前,我开发了一个Spring Boot应用程序,该应用程序主要从消息队列(约5个并发使用者)中提取产品评论数据,并将其存储到MySQL DB中.每个评论都可以通过其reviewIdentifier(String)进行唯一标识,该评论标志是主键,可以属于一个或多个产品(例如,具有不同颜色的产品).这是数据模型的摘录:

public class ProductPlacement implements Serializable{

   private static final long serialVersionUID = 1L;

   @Id
   @GeneratedValue(strategy = GenerationType.AUTO)
   @Column(name = "product_placement_id")
   private long id;

   @ManyToMany(fetch = FetchType.LAZY, cascade = CascadeType.ALL, mappedBy="productPlacements")
   private Set<CustomerReview> customerReviews;
}

public class CustomerReview implements Serializable{

   private static final long serialVersionUID = 1L;

   @Id
   @Column(name = "customer_review_id")
   private String reviewIdentifier;

   @ManyToMany(fetch = FetchType.LAZY, cascade = CascadeType.ALL)
   @JoinTable(
        name = "tb_miner_review_to_product",
           joinColumns = @JoinColumn(name = "customer_review_id"),
           inverseJoinColumns = @JoinColumn(name = "product_placement_id")
        )
   private Set<ProductPlacement> productPlacements;
}

队列中的一条消息包含1-15条评论和一个productPlacementId.现在,我想要一种有效的方法来保留产品的评论.每次传入审核基本上都需要考虑两种情况:

  1. 评论不在数据库中->插入参考消息中包含的产品的评论
  2. 评论已在数据库中->只需将产品引用添加到现有评论的设置productPlacements"即可.

目前,我保留评论的方法不是最佳的.它看起来如下(使用Spring Data JpaRespoitories):

@Override
@Transactional
public void saveAllReviews(List<CustomerReview> customerReviews, long productPlacementId) {
    ProductPlacement placement = productPlacementRepository.findOne(productPlacementId);
    for(CustomerReview review: customerReviews){
        CustomerReview cr = customerReviewRepository.findOne(review.getReviewIdentifier());
        if (cr!=null){
            cr.getProductPlacements().add(placement);
            customerReviewRepository.saveAndFlush(cr);
        }   
        else{
            Set<ProductPlacement> productPlacements = new HashSet<>();
            productPlacements.add(placement);
            review.setProductPlacements(productPlacements);
            cr = review;
            customerReviewRepository.saveAndFlush(cr);
        }

    }
}

问题:

  1. 由于违反了"reviewIndentifier"上的唯一约束,我有时会遇到constraintViolationExceptions.显然,这是因为我(同时)查看该评论是否已经存在,然后插入或更新它.我该如何避免呢?
  2. 在我的情况下,最好使用save()或saveAndFlush().我每秒钟获得约50-80条评论.如果我只使用save(),休眠会自动刷新吗?还是会导致内存使用量大大增加?

更新至问题1:我的Review-Repository上的简单@Lock会出现唯一约束异常吗?

@Lock(LockModeType.PESSIMISTIC_WRITE)
CustomerReview findByReviewIdentifier(String reviewIdentifier);

当findByReviewIdentifier返回null时会发生什么?即使该方法返回null,也可以休眠锁定reviewIdentifier以进行潜在的插入吗?

谢谢!

解决方案

从性能的角度来看,我将考虑通过以下更改来评估解决方案.

  1. 从双向ManyToMany更改为双向OneToMany

我有一个相同的问题,即从执行的DML语句中哪个更有效.引用自典型的ManyToMany映射与两个OneToMany

.

从配置的角度来看,选项一可能更简单,但它会产生效率较低的DML语句.

使用第二个选项是因为每当关联由@ManyToOne关联控制时,DML语句始终是最有效的.


  1. 启用DML语句批处理

启用批处理支持将减少往返数据库以插入/更新相同数量记录的往返次数.

批处理INSERT和UPDATE语句引用

hibernate.jdbc.batch_size = 50
hibernate.order_inserts = true
hibernate.order_updates = true
hibernate.jdbc.batch_versioned_data = true


  1. 删除saveAndFlush调用次数

当前代码获取ProductPlacement,并且对于每个review,它都会执行saveAndFlush,这将导致没有批处理DML语句.

相反,我会考虑加载ProductPlacement实体并将List<CustomerReview> customerReviews添加到ProductPlacement实体的Set<CustomerReview> customerReviews字段中,并最终在最后一次调用merge方法,并进行以下两个更改:

  • 成为关联的ProductPlacement实体所有者,即,将mappedBy属性移到CustomerReview实体的Set<ProductPlacement> productPlacements字段上.
  • 使CustomerReview实体通过在这些方法中使用reviewIdentifier字段来实现equalshashCode方法.我相信reviewIdentifier是唯一的并由用户分配.

最后,当您通过这些更改进行性能调整时,请以当前代码为基准来确定性能.然后进行更改,并比较更改是否确实为您的解决方案带来了显着的性能改善.

at the moment I develop a Spring Boot application which mainly pulls product review data from a message queue (~5 concurrent consumer) and stores them to a MySQL DB. Each review can be uniquely identified by its reviewIdentifier (String), which is the primary key and can belong to one or more product (e.g. products with different colors). Here is an excerpt of the data-model:

public class ProductPlacement implements Serializable{

   private static final long serialVersionUID = 1L;

   @Id
   @GeneratedValue(strategy = GenerationType.AUTO)
   @Column(name = "product_placement_id")
   private long id;

   @ManyToMany(fetch = FetchType.LAZY, cascade = CascadeType.ALL, mappedBy="productPlacements")
   private Set<CustomerReview> customerReviews;
}

public class CustomerReview implements Serializable{

   private static final long serialVersionUID = 1L;

   @Id
   @Column(name = "customer_review_id")
   private String reviewIdentifier;

   @ManyToMany(fetch = FetchType.LAZY, cascade = CascadeType.ALL)
   @JoinTable(
        name = "tb_miner_review_to_product",
           joinColumns = @JoinColumn(name = "customer_review_id"),
           inverseJoinColumns = @JoinColumn(name = "product_placement_id")
        )
   private Set<ProductPlacement> productPlacements;
}

One message from the queue contains 1 - 15 reviews and a productPlacementId. Now I want an efficient method to persist the reviews for the product. There are basically two cases which need to be considered for each incomming review:

  1. The review is not in the database -> insert review with reference to the product that is contained in the message
  2. The review is already in the database -> just add the product reference to the Set productPlacements of the existing review.

Currently my method for persisting the reviews is not optimal. It looks as follows (uses Spring Data JpaRespoitories):

@Override
@Transactional
public void saveAllReviews(List<CustomerReview> customerReviews, long productPlacementId) {
    ProductPlacement placement = productPlacementRepository.findOne(productPlacementId);
    for(CustomerReview review: customerReviews){
        CustomerReview cr = customerReviewRepository.findOne(review.getReviewIdentifier());
        if (cr!=null){
            cr.getProductPlacements().add(placement);
            customerReviewRepository.saveAndFlush(cr);
        }   
        else{
            Set<ProductPlacement> productPlacements = new HashSet<>();
            productPlacements.add(placement);
            review.setProductPlacements(productPlacements);
            cr = review;
            customerReviewRepository.saveAndFlush(cr);
        }

    }
}

Questions:

  1. I sometimes get constraintViolationExceptions because of violating the unique constraint on the "reviewIndentifier". This is obviously because I (concurrently) look if the review is already present and than insert or update it. How can I avoid that?
  2. Is it better to use save() or saveAndFlush() in my case. I get ~50-80 reviews per secound. Will hibernate flush automatically if I just use save() or will it result in greatly increased memory usage?

Update to question 1: Would a simple @Lock on my Review-Repository prefent the unique-constraint exception?

@Lock(LockModeType.PESSIMISTIC_WRITE)
CustomerReview findByReviewIdentifier(String reviewIdentifier);

What happens when the findByReviewIdentifier returns null? Can hibernate lock the reviewIdentifier for a potential insert even if the method returns null?

Thank you!

解决方案

From a performance point of view, I will consider evaluating the solution with the following changes.

  1. Changing from bidirectional ManyToMany to bidirectional OneToMany

I had a same question on which one is more efficient from DML statements that gets executed. Quoting from Typical ManyToMany mapping versus two OneToMany.

The option one might be simpler from a configuration perspective, but it yields less efficient DML statements.

Use the second option because whenever the associations are controlled by @ManyToOne associations, the DML statements are always the most efficient ones.


  1. Enable the batching of DML statements

Enabling the batching support would result in less number of round trips to the database to insert/update the same number of records.

Quoting from batch INSERT and UPDATE statements

hibernate.jdbc.batch_size = 50
hibernate.order_inserts = true
hibernate.order_updates = true
hibernate.jdbc.batch_versioned_data = true


  1. Remove the number of saveAndFlush calls

The current code gets the ProductPlacement and for each review it does a saveAndFlush, which results in no batching of DML statements.

Instead I would consider loading the ProductPlacement entity and adding the List<CustomerReview> customerReviews to the Set<CustomerReview> customerReviews field of ProductPlacement entity and finally call the merge method once at the end, with these two changes:

  • Making ProductPlacement entity owner of the association i.e., by moving mappedBy attribute onto Set<ProductPlacement> productPlacements field of CustomerReview entity.
  • Making CustomerReview entity implement equals and hashCode method by using reviewIdentifier field in these method. I believe reviewIdentifier is unique and user assigned.

Finally, as you do performance tuning with these changes, baseline your performance with the current code. Then make the changes and compare if the changes are really resulting in the any significant performance improvement for your solution.

这篇关于Spring Data JPA-并发批量插入/更新的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆