Spring Batch-如何以更快的方式读取500万条记录? [英] Spring Batch - How to reads 5 million records in faster ways?

查看：1206 发布时间：2020/7/9 1:04:22 spring spring-batch

本文介绍了Spring Batch-如何以更快的方式读取500万条记录?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在开发Spring Boot v2.2.5.RELEASE和Spring Batch示例.在此示例中，我正在使用JdbcPagingItemReader从Postgres系统从一个数据中心读取500万条记录，并将其写入MongoDB到另一个数据中心.

I'm developing Spring Boot v2.2.5.RELEASE and Spring Batch example. In this example, I'm reading 5 million records using JdbcPagingItemReader from Postgres system from one data-center and writing in into MongoDB into another data-center.

此迁移太慢，需要使此批处理作业的性能更好.我不确定如何使用分区，因为该表中的PK包含UUID值，因此我无法考虑使用ColumnRangePartitioner.有什么最佳方法可以实现这一点?

This migration is too slow and need to make the better performance of this batch job. I 'm not sure on how to use partition, because I have a PK in that table holds UUID values, so I can't think of using ColumnRangePartitioner. Is there any best approach to implement this?

方法1:

@Bean
public JdbcPagingItemReader<Customer> customerPagingItemReader(){
    // reading database records using JDBC in a paging fashion
    JdbcPagingItemReader<Customer> reader = new JdbcPagingItemReader<>();
    reader.setDataSource(this.dataSource);
    reader.setFetchSize(1000);
    reader.setRowMapper(new CustomerRowMapper());

    // Sort Keys
    Map<String, Order> sortKeys = new HashMap<>();
    sortKeys.put("cust_id", Order.ASCENDING);

    // POSTGRES implementation of a PagingQueryProvider using database specific features.
    PostgresPagingQueryProvider queryProvider = new PostgresPagingQueryProvider();
    queryProvider.setSelectClause("*");
    queryProvider.setFromClause("from customer");
    queryProvider.setSortKeys(sortKeys);

    reader.setQueryProvider(queryProvider);

    return reader;
}

然后是Mongo作家，我已经将Spring Data Mongo用作自定义作家:

Then Mongo writer, I've used Spring Data Mongo as custom writer:

工作详情

@Bean
    public Job multithreadedJob() {
        return this.jobBuilderFactory.get("multithreadedJob")
                .start(step1())
                .build();
    }

    @Bean
    public Step step1() {
        ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
        taskExecutor.setCorePoolSize(4);
        taskExecutor.setMaxPoolSize(4);
        taskExecutor.afterPropertiesSet();

        return this.stepBuilderFactory.get("step1")
                .<Transaction, Transaction>chunk(100)
                .reader(fileTransactionReader(null))
                .writer(writer(null))
                .taskExecutor(taskExecutor)
                .build();
    }

方法2:AsyncItemProcessor和AsyncItemWriter是更好的选择，因为我仍然必须使用相同的JdbcPagingItemReader进行阅读?

Approach-2: AsyncItemProcessor and AsyncItemWriter would be the better option, because still I've to read using same JdbcPagingItemReader?

方法3:分区，在我以PK身份作为UUID的地方如何使用它?

Approach-3: Partition, how to use it where I've PK as UUID?

Spring Batch-如何以更快的方式读取500万条记录? [英] Spring Batch - How to reads 5 million records in faster ways?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spring Batch-如何以更快的方式读取500万条记录? [英] Spring Batch - How to reads 5 million records in faster ways?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭