批处理文件并与数据库进行比较 [英] Batch processing files and diff with database
问题描述
当前,我正在开发一个Spring-Boot应用程序,该应用程序正在定期尝试处理包含用户数据的文件,其中每行包含以|
分隔的userId
和departamentId
(例如,123534|13
).该文件将包含几百万条记录.
Currently I am developing a Spring-Boot application that is periodically trying to process a file containing user data where every line contains userId
and departamentId
separated by |
for example 123534|13
. That file will be containing few milions of records.
我的要求是以下列方式将该数据加载到mysql数据库中:
My requirement is to load this data into mysql database in such manner that:
- 如果存在具有已处理ID的用户,则不执行任何操作
- 如果该用户不存在,请创建新用户
- 如果用户不在列表中,但是在数据库中,请删除
- 如果数据库中没有当前部门,请创建
- If the user with processed ID exists, do not do anything
- If the user is not existing create new user
- If the user is not on the list but is present in the database, remove it
- If department with current is not present on the database, create it
我做了一些优化,例如
- 缓存部门以填充实体
- 批收集要保存的用户并通过
JpaRepository
saveAll
方法对其进行保存
- Caching Departaments to fill entity
- Batch collecting Users to Save and saving it throuch
JpaRepository
saveAll
method
但是我仍然对数据库进行了太多的数据库调用,我在检查用户是否存在,以便为每个记录创建要保存的实体...
But Still I am making too many database calls to the database, I am checking if user is present in order to create entity for save for every record...
我的实体非常简单:
@Entity
@Table(name = "departaments")
public class Departament{
@Id
@Column(name = "id")
private Long id;
@Column(name = "name")
private String name;
和:
@Entity
@Table(name = "users")
public class User {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "id")
private Long id;
@ManyToOne
@JoinColumn(name = "departament_id")
private Departament departament;
有人遇到这样的问题吗?
是否可以对其进行更多优化?
有什么好的处理模式吗?
推荐答案
如果这是替换",请执行以下操作以避免停机:
If it is a "replacement", do this to avoid any downtime:
CREATE TABLE new LIKE old;
LOAD DATA INFILE ... (and any other massaging)
RENAME TABLE real TO old, new TO real;
DROP TABLE old;
如果它是增量",则将LOAD
放入单独的表中,然后执行适当的SQL语句以执行更新.在您的课题中,每个项目符号大约需要一条SQL语句.没有循环.
If it is a "delta", the LOAD
it into a separate table, then execute suitable SQL statements to perform the updates. It would be approximately one SQL statement per bullet item in your Question. No loops.
这篇关于批处理文件并与数据库进行比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!