最佳 Spring 批量扩展策略 [英] Best Spring batch scaling strategy

查看:18
本文介绍了最佳 Spring 批量扩展策略的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有简单的批处理,运行良好.最近我们有新的要求来实现新的批处理来生成报告.我们有不同的数据源可供阅读以准备此报告.具体来说,我们可能对每个报告都有一个视图.

We have simple batch processes which are working fine. Recently we have new reqmnt to implement new batch process to generate reports. We have diff source of data to read to prepare this reports. Specifically we might have one view for each report.

现在我们希望以这样一种方式扩展此过程,使其可以扩展并尽早完成.

Now we want to scale this process in such a way that, it can be scaled and be completed as early as possible.

我熟悉多线程步骤,但不确定其他策略(远程分块和分区步骤)以及何时使用哪一种.

I am familiar with multithread step but not sure about other strategy(Remote chunking and partition step) and which one to use when.

在我们的例子中,处理 + 写入文件比读取更多的资源激励.

In our case processing + writing to file is more resource incentive then reading.

在这种情况下,哪种方法最适合.

In such cases which approach is best suited.

或者,如果我们发现从 db 读取数据与写入 + 处理到文件具有相同的资源激励,那么我们必须改进/扩展此过程的最佳选择是什么.

Or if we find out that reading data from db is same resource incentive as writing + processing to file then what is the best option we have to improve/scale this process.

推荐答案

TLDR;

根据您的描述,我认为您可以尝试使用同步阅读器的多线程步骤,因为您提到处理和写入是您步骤中更昂贵的部分.

Based on your description I think you could try Multi-threaded Step with Synchronized Reader since you mention processing and writing are the more expensive part of your step.

但是,鉴于您的读者是一个数据库,我认为配置分区步骤并使其工作将非常有益.设置需要更多的工作,但从长远来看会更好地扩展.

However, seeing as your reader is a database, I think getting a partitioned step configured and working would be very beneficial. It takes a little more work to get set up but will scale better in the long run.

用于:

  • 加速单个步骤
  • 何时可以由读取器(即 JMS 或 AMQP)处理负载平衡
  • 使用自定义读取器手动对正在读取的数据进行分区时

请勿用于:

  • 有状态的项目阅读器

多线程步骤利用面向块的处理 受雇于 Spring Batch.当您对一个步骤进行多线程处理时,它允许 spring 批处理在它自己的线程中执行整个 chunk.请注意,这意味着数据块的整个读取-处理-写入周期将并行发生.这意味着无法保证处理您的数据的顺序.另请注意,这将与有状态的 ItemReaders 一起使用(JdbcCursorItemReaderJdbcPagingItemReader 都是有状态的.

Multi-threaded steps utilize the chunk-oriented processing employed by Spring Batch. When you multi-thread a step it allows spring batch to execute an entire chunk in it's own thread. Note that this means the entire read-process-write cycle for your chunks of data will occur in parallel. This means there is no guaranteed order for processing your data. Also note that this will not work with stateful ItemReaders (JdbcCursorItemReader and JdbcPagingItemReader are both stateful).

用于:

  • 加快单个步骤的处理和编写
  • 当阅读是有状态的

请勿用于:

  • 加快阅读速度

有一种方法可以解决无法对有状态项目读取器使用多线程步骤的限制.你可以同步他们的read()方法.这基本上会导致读取串行发生(尽管仍然不能保证顺序)但仍然允许处理和写入并行发生.当读取不是瓶颈而处理或写入是瓶颈时,这可能是一个不错的选择.

There is one way around the limitation of not being able to use multi-threaded steps with stateful item readers. You can synchronize their read() method. This will essentially cause reads to happen serially (still no guarantee on order though) but still allow processing and writing to happen in parallel. This can be a good option when reading is not the bottleneck but processing or writing is.

用于:

  • 加速单个步骤
  • 当阅读是有状态的
  • 何时可以对输入数据进行分区

请勿用于:

  • 当输入数据无法分区时

对步骤进行分区与多线程步骤的行为略有不同.通过分区步骤,您实际上拥有完全不同的 StepExecutions.每个 StepExecution 都在它自己的数据分区上工作.这样,阅读器在读取相同数据时不会有问题,因为每个阅读器只查看数据的特定切片.这种方法非常强大,但设置起来也比多线程步骤复杂.

Partitioning a step behaves slightly different than a multi-threaded step. With a partitioned step you actually have complete distinct StepExecutions. Each StepExecution works on it's own partition of the data. This way the reader does not have problems reading the same data because each reader is only looking at a specific slice of the data. This method is extremely powerful but is also more complicated to set up than a multi-threaded step.

用于:

  • 加快单个步骤的处理和编写
  • 有状态的读者

请勿用于:

  • 加快阅读速度

远程分块是非常高级的 Spring Batch 用法.它需要某种形式的持久中间件来发送和接收消息(即 JMS 或 AMQP).使用远程分块,读取仍然是单线程的,但是当读取每个块时,它会被发送到另一个 JVM 进行处理.实际上,这与多线程步骤的工作方式非常相似,但是远程分块可以使用多个进程,而不是多个线程.这意味着远程分块允许您水平扩展您的应用程序,而不是垂直扩展它.(TBH 我认为如果你正在考虑实现远程分块,你应该考虑看看像 Hadoop 这样的东西.)

Remote chunking is very advanced Spring Batch usage. It requires to have some form of durable middleware to send and receive messages on (i.e. JMS or AMQP). With remote chunking, reading is still single-threaded but as each chunk is read it is sent to another JVM for processing. In practice this is very similar to how a multi-threaded step works however remote chunking can utilize more than one process as opposed to more than one thread. This means that remote chunking allows you to horizontally scale your application as opposed to vertically scaling it. (TBH I think if you are thinking about implementing remote chunking, you should consider taking a look at something like Hadoop.)

用于:

  • 加快整体工作执行
  • 当存在不相互依赖的独立步骤时

请勿用于:

  • 加快步骤执行
  • 相关步骤

当您有一个或多个可以独立执行的步骤时,并行步骤很有用.Spring 批处理可以轻松地允许步骤在单独的线程中并行执行.

Parallel steps are useful when you have on or more steps that can execute independently. Spring batch can easily allow steps to execute in parallel in seperate threads.

这篇关于最佳 Spring 批量扩展策略的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆