批处理文件并与数据库进行比较 [英] Batch processing files and diff with database

查看:119
本文介绍了批处理文件并与数据库进行比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当前,我正在开发一个Spring-Boot应用程序,该应用程序正在定期尝试处理包含用户数据的文件,其中每行包含以|分隔的userIddepartamentId(例如,123534|13).该文件将包含几百万条记录.

Currently I am developing a Spring-Boot application that is periodically trying to process a file containing user data where every line contains userId and departamentId separated by | for example 123534|13. That file will be containing few milions of records.

我的要求是以下列方式将该数据加载到mysql数据库中:

My requirement is to load this data into mysql database in such manner that:

  • 如果存在具有已处理ID的用户,则不执行任何操作
  • 如果该用户不存在,请创建新用户
  • 如果用户不在列表中,但是在数据库中,请删除
  • 如果数据库中没有当前部门,请创建
  • If the user with processed ID exists, do not do anything
  • If the user is not existing create new user
  • If the user is not on the list but is present in the database, remove it
  • If department with current is not present on the database, create it

我做了一些优化,例如

  • 缓存部门以填充实体
  • 批收集要保存的用户并通过JpaRepository saveAll方法对其进行保存
  • Caching Departaments to fill entity
  • Batch collecting Users to Save and saving it throuch JpaRepository saveAll method

但是我仍然对数据库进行了太多的数据库调用,我在检查用户是否存在,以便为每个记录创建要保存的实体...

But Still I am making too many database calls to the database, I am checking if user is present in order to create entity for save for every record...

我的实体非常简单:

@Entity
@Table(name = "departaments")
public class Departament{

    @Id
    @Column(name = "id")
    private Long id;

    @Column(name = "name")
    private String name;

和:

@Entity
@Table(name = "users")
public class User {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    @Column(name = "id")
    private Long id;

    @ManyToOne
    @JoinColumn(name = "departament_id")
    private Departament departament;

有人遇到这样的问题吗?

是否可以对其进行更多优化?

有什么好的处理模式吗?

推荐答案

如果这是替换",请执行以下操作以避免停机:

If it is a "replacement", do this to avoid any downtime:

CREATE TABLE new LIKE old;
LOAD DATA INFILE ... (and any other massaging)
RENAME TABLE real TO old, new TO real;
DROP TABLE old;

如果它是增量",则将LOAD放入单独的表中,然后执行适当的SQL语句以执行更新.在您的课题中,每个项目符号大约需要一条SQL语句.没有循环.

If it is a "delta", the LOAD it into a separate table, then execute suitable SQL statements to perform the updates. It would be approximately one SQL statement per bullet item in your Question. No loops.

这篇关于批处理文件并与数据库进行比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆