JPA:EntityManager花费的时间太长,无法保存数据 [英] JPA : EntityManager is taking too long to save the data

查看:220
本文介绍了JPA:EntityManager花费的时间太长,无法保存数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个csv文件数据,总共有100 000条记录.我正在遍历记录,并尝试为每个记录更新5个表.这是示例数据:

I have a csv file of data which has altogether 100 000 records. I am iterating over the records and trying to update 5 tables for each record. Here is the sample data:

EAN Code,Site,Genric Material,Material,Sap Ean Code,Style,Color,Size,MRP,Gender,EAN Code,Season,Collection,BRAND,Color revision,Category (L5),Category (L6)
123456789,6001,000000000061000102,000000061000102001,61000102001,03/BE100,SC/TG,L/112 cm,850.00,MENS,123456789,AW12,Colors,XXXXXX,RD/TG,Tee Shirt,Graphic

每次迭代将更新的五个表如下:

The five tables that will be updating for each iteration are as follows:

  1. 大师
  2. MasterDescription
  3. 属性
  4. AttributeValues
  5. 关联表

上述表格之间的关系如下:

The relationship between the above mentioned tables are as follows:

主M-M属性值

Master M-1 MatserDescription

Master M-1 MatserDescription

主M-M属性

属性1-M AttributeValues

Attributes 1-M AttributeValues

这是我必须使用批处理技术在单个会话中将CSV数据保存到5个表中的代码:

Here is the code that I have to save the CSV data into 5 tables in a single session using batch technique:

服务等级

@Service
public class EanService{

@AutoWired
public EanRepository eanrepository;

// Method that saves data from CSV to DataBase
@Transactional
public void saveEANMasterData1(BufferedReader br, String userName,
        List<EanAttributes> attributes, String eanMasterName,String description) {
    int i =1;

    EanMasterDiscription eanDes = new EanMasterDiscription();
    User user = userRepository.findUserByUsername(userName);
    EanMasterDiscription deciption = null;
    eanDes.setDescription(description);
    eanDes.setMasterName(eanMasterName);
    eanDes.setDate(new Timestamp(Calendar.getInstance()
            .getTimeInMillis()));
    String line;
    try {
        List<Ean> eans = new ArrayList<Ean>();
        // iterating over each record in the CSV and saving the data into DB            
        while (((line = br.readLine()) != null)) {
             String[] cols = line.split(",");
             // Style Keeping Unit
             Ean ean = new Ean();
             for(EanAttributes attr : attributes){
                 EanAttributeValues eanAttributeValues = new EanAttributeValues();
                 if(attr.getAttrInferredType().equalsIgnoreCase("EAN")){
                         ean.setEAN(cols[attr.getAttributeOrder()]);
                 }else if(attr.getAttrInferredType().equalsIgnoreCase("Season")){
                     ean.setSeason(cols[attr.getAttributeOrder()]);
                 }else {
                     if(attr.getAttrInferredType().equalsIgnoreCase("Attribute")){
                         EanAttributes eanAttr = eanrepository.loadAttrsListByAttName(attr.getAttributeName());
                         if(eanAttr == null){
                             eanAttributeValues.setAttributeValue(cols[attr.getAttributeOrder()]);
                             eanAttributeValues.setEanAttributes(attr);
                             ean.getEanAttributeValues().add(eanAttributeValues);
                             ean.getEanAttributes().add(attr);
                             attr.getEan().add(ean);
                         }else{
                             ean.getEanAttributes().add(eanAttr);
                             eanAttr.getEan().add(ean);
                             if(eanrepository.isAttributeValueAvailable(cols[attr.getAttributeOrder()])){
                                 eanAttributeValues.setAttributeValue(cols[attr.getAttributeOrder()]);
                                 eanAttributeValues.setEanAttributes(eanAttr);
                                 ean.getEanAttributeValues().add(eanAttributeValues);
                             }else{
                                 EanAttributeValues values = eanrepository.loadDataByAttrValue(cols[attr.getAttributeOrder()]);
                                 ean.getEanAttributeValues().add(values);
                                 values.getEan().add(ean);
                             }
                         }
                         eanAttributeValues.getEan().add(ean);
                     }
                 }
             }
             if(!eanrepository.isEanMasterNameAvailable(eanMasterName)){
                EanMasterDiscription eanMasterDes = eanrepository.loadDataByMasterName(eanMasterName);
                 ean.setEanMasterDesciption(eanMasterDes);
             }else{
                 ean.setEanMasterDesciption(eanDes);
             }
             ean.setUser(user);
             if(eanrepository.isEanWithSeasonAvailable(ean.getEAN(),ean.getSeason())){
                     // Persisting Ean; I think there is some problem with this method
                     eanrepository.saveEanData(ean,i);
             }else{
                 System.out.println("************ EAN ALREADY EXIST ******************** ");
             }

             i++;
        }
    } catch (NumberFormatException | IOException e) {
        e.printStackTrace();
    }       
    }
}

存储库类

@Repository
public class EanRepository{

@PersistanceContext
EntityManager em;

public void saveEanData(Ean ean , int recordNum){
    em.merge(ean);
    if(recordNum % 50 == 0){
        em.flush();
        em.clear();
        // em.getEntityManagerFactory().getCache().evictAll();
    }
}

}

但这要花费太多时间(将近10个小时)才能完成所有10万条记录的保存.我们如何减少时间和我所缺少的东西?

But this is taking too much time (nearly 10hrs) to finish saving all the 100 000 records. How can we reduce the time and what I am missing?

推荐答案

我在批处理应用程序中遇到了同样的问题,我们采用了两种技术来极大地加快数据导入过程:

I was having same problems in my batch application and we have incorporated two techniques which vastly speed up the process of importing the data:

1)多线程-您必须利用多个线程来处理文件数据并进行保存.

1) Multithreading - You have to take advantage of multiple threads processing your file data and doing the saving.

我们的方法是首先从文件中读取所有数据,然后将其打包到一组POJO对象中.

The way we did it was to first, read all the data from the file and pack it into a Set of POJO objects.

然后根据我们可以创建的可能线程的数量,将Set均匀地分割,并在一定范围的数据中提供线程.

Then based on the number of possible threads that we can create we would split the Set evenly and feed the threads with a certain range of data.

然后每个集合将被并行处理.

Then each set would be processed in parallel.

我不打算赘述,因为这超出了这个问题的范围.我可以提供的一个提示是,您应该尝试利用java.util.concurrent及其提供的功能.

I am not going get into the details as this is outside of the boundaries of this question. Just a tip that i can give is that you should try to take advantage of the java.util.concurrent and features it offers.

2)批量保存-我们所做的第二个改进是利用了hibernate的批量保存功能(您已经添加了Hibernate标记,因此我认为这是您的基础持久性提供程序):

2) Batch Saving - The second improvement that we did was to take advantage of the batch save feature of hibernate (you have added the Hibernate tag so i assume this is your underlying persistence provider):

您可以尝试利用批量插入功能.

You can try and take advantage of the bulk insert feature.

有一个可以定义为启用此功能的休眠属性:

There is hibernate property which you can define to enable this feature:

<property name="jdbc.batch_size">250</property>

使用此批处理设置,您应该具有以下输出:

With this batch setting you should have output like:

insert into Table(id , name) values (1, 'na1') , (2, 'na2') ,(3, 'na3')..

代替

insert into Table(id , name) values (1, 'na1');
insert into Table(id , name) values (2, 'na2');
insert into Table(id , name) values (3, 'na3');

3)刷新计数-在刷新到数据库之前,您已将计数设置为50.现在启用了批处理插入功能,也许您可​​以将其提高到几分.尝试用这个数字找到最佳位置.

3) Flush count - you have your count set to 50 before you flush to the db.. now with the batch inserts enabled maybe you could raise it up a bit to few houndread.. try to experiment with this number to find the sweet spot.

这篇关于JPA:EntityManager花费的时间太长,无法保存数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆