Spring Batch-如何以更快的方式读取500万条记录? [英] Spring Batch - How to reads 5 million records in faster ways?
问题描述
我正在开发Spring Boot v2.2.5.RELEASE和Spring Batch示例.在此示例中,我正在使用JdbcPagingItemReader
从Postgres
系统从一个数据中心读取500万条记录,并将其写入MongoDB
到另一个数据中心.
I'm developing Spring Boot v2.2.5.RELEASE and Spring Batch example. In this example, I'm reading 5 million records using JdbcPagingItemReader
from Postgres
system from one data-center and writing in into MongoDB
into another data-center.
此迁移太慢,需要使此批处理作业的性能更好.我不确定如何使用分区,因为该表中的PK包含UUID值,因此我无法考虑使用ColumnRangePartitioner
.有什么最佳方法可以实现这一点?
This migration is too slow and need to make the better performance of this batch job. I 'm not sure on how to use partition, because I have a PK in that table holds UUID values, so I can't think of using ColumnRangePartitioner
. Is there any best approach to implement this?
方法1:
@Bean
public JdbcPagingItemReader<Customer> customerPagingItemReader(){
// reading database records using JDBC in a paging fashion
JdbcPagingItemReader<Customer> reader = new JdbcPagingItemReader<>();
reader.setDataSource(this.dataSource);
reader.setFetchSize(1000);
reader.setRowMapper(new CustomerRowMapper());
// Sort Keys
Map<String, Order> sortKeys = new HashMap<>();
sortKeys.put("cust_id", Order.ASCENDING);
// POSTGRES implementation of a PagingQueryProvider using database specific features.
PostgresPagingQueryProvider queryProvider = new PostgresPagingQueryProvider();
queryProvider.setSelectClause("*");
queryProvider.setFromClause("from customer");
queryProvider.setSortKeys(sortKeys);
reader.setQueryProvider(queryProvider);
return reader;
}
然后是Mongo作家,我已经将Spring Data Mongo用作自定义作家:
Then Mongo writer, I've used Spring Data Mongo as custom writer:
工作详情
@Bean
public Job multithreadedJob() {
return this.jobBuilderFactory.get("multithreadedJob")
.start(step1())
.build();
}
@Bean
public Step step1() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(4);
taskExecutor.setMaxPoolSize(4);
taskExecutor.afterPropertiesSet();
return this.stepBuilderFactory.get("step1")
.<Transaction, Transaction>chunk(100)
.reader(fileTransactionReader(null))
.writer(writer(null))
.taskExecutor(taskExecutor)
.build();
}
方法2:AsyncItemProcessor和AsyncItemWriter是更好的选择,因为我仍然必须使用相同的JdbcPagingItemReader进行阅读?
Approach-2: AsyncItemProcessor and AsyncItemWriter would be the better option, because still I've to read using same JdbcPagingItemReader?
方法3:分区,在我以PK身份作为UUID的地方如何使用它?
Approach-3: Partition, how to use it where I've PK as UUID?
推荐答案
分区(方法3)是IMO的最佳选择.如果您的主键是字符串,则可以尝试创建复合键(又称列组合以构成唯一键).
Partitioning (approach 3) is the best option IMO. If your primary key is a String, you can try to create a compound key (aka a combination of columns to make up a unique key).
这篇关于Spring Batch-如何以更快的方式读取500万条记录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!