如何使用JPA分页从32k行Excel将数据持久化到MySql数据库? [英] How to persist data to a MySql database from a 32k row Excel using JPA Pagination?

查看:74
本文介绍了如何使用JPA分页从32k行Excel将数据持久化到MySql数据库?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个32k行的大型Excel文件,还有一个Java-Spring代码,可将Excel数据持久保存到mySQL数据库.由于JPA的限制,我的代码适用于大约6k行,但不适用于整个Excel.我读到它可以通过JPA分页来完成,但是到目前为止,我只找到了从DB收集数据(已经保留了数据)并呈现到UI的信息. Excel文件包含32k药品,该行将保留在DB中.

I have a large Excel file, with 32k rows, and a Java-Spring code that persist the Excel data to mySQL database. My code works for about 6k row's, but not for the entire Excel due to JPA limitation. I read that it can be done with JPA Pagination but all so far I found only info that collect data from DB (already persisted with data) and render to a UI. The Excel file contain 32k medicines, and this rows will be persisted into DB.

我具有以下方法的Controller层:

I have this Controller layer with the following method:

    public ResponseEntity<ResponseMessage> uploadFile(@RequestParam("file") MultipartFile file,
                                                    @RequestParam(defaultValue = "0") int page,
                                                    @RequestParam(defaultValue = "6000") int size) {

        String message = "";

        if (ExcelHelper.hasExcelFormat(file)) {
            try {
// the following 6 row are my patetic attempt to resolve with pagination
               List<Medicine> medicines = new ArrayList<>();
                Pageable paging = PageRequest.of(page, size);
                Page<Medicine> pageMedicamente = medicineRepositoryDao.save(paging);

                medicines = pageMedicamente.getContent();
                medicineService.save(file);

                message = "Uploaded the file successfully: " + file.getOriginalFilename();
                return ResponseEntity.status(HttpStatus.OK).body(new ResponseMessage(message));
            } catch (Exception e) {
                message = "Could not upload the file: " + file.getOriginalFilename() + "!";
                return ResponseEntity.status(HttpStatus.EXPECTATION_FAILED).body(new ResponseMessage(message));
            }
        }

存储库层:

@Repository
public interface MedicineRepositoryDao extends JpaRepository<Medicine, Long> {


    Page<Medicine> save( Pageable pageable);

}

还有服务层:

        try {
            List<Medicine> medicines = ExcelHelper.excelToMedicine(file.getInputStream());
            medicineRepositoryDao.saveAll(medicines);
        } catch (IOException e) {
            throw new RuntimeException("fail to store excel data: " + e.getMessage());
        }
    }

推荐答案

我认为您在这里有一些想法.

I think you have a couple of thinks mixed up here.

  1. 我认为Spring对您可以在此处保留的行数没有任何相关限制.但JPA确实如此. JPA确实保留引用您保存在其一级缓存中的任何实体.因此,对于大量的行/实体,这会占用内存,并且由于实体被逐一查找或处理,还会使某些操作变慢.

  1. I don't think Spring has any relevant limitation on the number of rows you may persist here. But JPA does. JPA does keeps are reference to any entity that you save in its first level cache. So for large number of rows/entities this hogs memory and also makes some operations slower since entities get looked up or processed one by one.

分页用于读取实体,而不用于保存.

Pagination is for reading entities, not for saving.

在这种情况下,您有两种选择.

You have a couple of options in this situation.

  1. 不使用JPA.仅从文件写入数据并将其写入数据库,JPA几乎没有任何好处.只需使用JdbcTemplateNamedParameterJdbcTemplate即可几乎完成此操作,并且速度更快,因为跳过了JPA的开销,在这种情况下您也不会从中受益.如果您想使用ORM,则可能需要看一下Spring Data JDBC,它在概念上更简单,并且不保留对实体的引用,因此在这种情况下应该显示出更好的特性.我建议您不要在这里使用ORM,因为您似乎并没有从实体中受益,因此创建它们然后让ORM从中提取数据确实是在浪费时间.

  1. Don't use JPA. For simply writing data from a file and writing it into a database JPA does hardly offer any benefit. This can almost trivially performed using just a JdbcTemplate or NamedParameterJdbcTemplate and will be much faster, since the overhead of JPA is skipped which you don't benefit from anyway in this scenario. If you want to use an ORM you might want to take a look at Spring Data JDBC which is conceptually simpler and doesn't keep references to entities and therefore should show better characteristics in this scenario. I recommend not using an ORM here since you don't seem to benefit from having entities, so creating them and then having the ORM extract the data from it is really a waste of time.

将导入分为几批.这意味着您坚持不懈在继续下一个1000行之前,一次将1000行写入到数据库中并提交事务.对于JPA,由于上述原因,这几乎是必需的.使用JDBC(即JdbcTemplate& Co)时,对于32K行来说,这可能不是必需的,但可能会提高性能,并且如果插入失败,则可能对可恢复性很有用. Spring Batch将帮助您实现这一目标.

Break your import into batches. This means you persist e.g. 1000 rows at time, write them to the database and commit the transaction, before you continue with the next 1000 rows. For JPA this is pretty much a necessity for the reasons laid out above. With JDBC (i.e. JdbcTemplate&Co) this probably isn't necessary for 32K rows but might improve performance and might be useful for recoverability if an insert fails. Spring Batch will help you implement that.

虽然上一点是在将导入分为多个块的意义上讨论批处理的,但您还应该在JDBC端查看批处理,在批处理中,您发送多个语句,或者在一个语句中发送带有多个参数集的单个语句转到数据库,这将再次提高性能.

While the previous point talks about batching in the sense of breaking your import into chunks you should also look into batching on the JDBC side, where you send multiple statements, or a single statements with multiple sets of parameters in one go to the database, which again should improve performance.

最后,在Javaverse之外,通常还有一些更适合该工作的替代方法.一些数据库具有可以非常有效地加载平面文件的工具.

Finally there are often alternatives outside of the Javaverse that might be more suitable for the job. Some databases have tools to load flat files extremely efficient.

这篇关于如何使用JPA分页从32k行Excel将数据持久化到MySql数据库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆