春季批处理:传播在分区步骤中遇到的异常(停止作业执行) [英] Spring batch : Propagate exception encountered in partitioned step (Stop job execution)

查看:153
本文介绍了春季批处理:传播在分区步骤中遇到的异常(停止作业执行)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景

我目前有一个弹簧批处理作业,可读取平面文件。作业使用 MultiResourcePartitioner 读取已拆分为N个较小文件的文件的物理分区。这意味着文件的每个物理分区将导致执行一个新的从属步骤,该步骤将读取该分区。

I currently have a spring-batch job that reads a flat file. The job uses a MultiResourcePartitioner to read physical partitions of a file that has been split into N number of smaller files. This means that each physical partition of the file will result in a new slave step being executed that reads the partition.

问题

如果读取任何物理分区时遇到任何问题,则该从属步骤的执行将失败,并且该异常将由spring batch记录。这不会影响正在读取文件的不同物理分区的其余从属步骤的执行;但是,这不是所需的行为。我想要的是,如果在读取特定的物理分区时出现问题(例如:无法解析特定的列),则应将异常传播到 Job 已启动,因此我可以停止任何进一步的处理。

If there is any issue reading any physical partition, the execution of that slave step will fail and the exception will be logged by spring batch. This does not impact the execution of the remaining slave steps that are reading different physical partitions of the file; however, this is not the desired behavior. What I want is that if there is an issue reading a particular physical partition (Example : not being able to parse a particular column), the exception should be propagated to the location where the Job was launched so that I can halt any further processing.

执行 AbstractStep 中的a>方法捕获 Throwable 并通过记录它来抑制异常。结果,该异常不会传播到启动 Job 的位置,并且无法停止其余从属步骤的执行。

The current implementation of the execute method in AbstractStep catches Throwable and suppresses the exception by logging it. As a result, the exception is not propagated to the location where the Job was launched and there is no way to halt the execution of the remaining slave steps.

如何使spring-batch将在从属步骤中发生的任何异常一直传播到 Job 启动了​​吗?我想这样做,以便在处理任何分区文件时出现问题时可以停止任何进一步的处理。

How can I make spring-batch propagate any exception that occurs in a slave step all the way to the location where the Job was launched? I want to do this so that I can halt any further processing if there is an issue processing any of the partitioned files.

推荐答案


如果读取任何物理分区有任何问题,则该从属步骤的执行将失败,并且该异常将由spring batch记录。这不会影响正在读取文件的不同物理分区的其余从属步骤的执行;但是,这不是理想的行为。

If there is any issue reading any physical partition, the execution of that slave step will fail and the exception will be logged by spring batch. This does not impact the execution of the remaining slave steps that are reading different physical partitions of the file; however, this is not the desired behavior.

我认为这不会影响其余奴隶的执行这一事实步骤 是所需的行为。通常,将大量工作划分为并行执行的较小任务的想法是,任务应该彼此独立,并且一个故障应该影响其他任务。如果有一种逻辑要求一个任务无法停止其他任务,那么这意味着任务没有被很好地定义为独立的,并且首先在本地/远程分区步骤中执行它们并不是适当的选择。

I would argue that the fact that "This does not impact the execution of the remaining slave steps" is the desired behaviour. Usually, the idea behind partitioning a big chunk of work into smaller tasks which are executed in parallel is that tasks should be independent from each others and one failure should not impact others. If there is a logic that requires the failure of one task to stop other tasks, it means that tasks are not well defined to be independent and executing them in a local/remote partitioned step is not the appropriate choice in the first place.


我想要的是,如果在读取特定的物理分区时出现问题(例如:无法解析特定的列),则例外应该传播到启动作业的位置,以便我可以停止任何进一步的处理。

What I want is that if there is an issue reading a particular physical partition (Example : not being able to parse a particular column), the exception should be propagated to the location where the Job was launched so that I can halt any further processing.

您需要自定义 PartitionHandler 。这是协调工人步骤的一块。默认行为是等待所有工作人员步骤完成并汇总结果,然后再报告给主作业。您的自定义实现应检测到任何工作程序步骤失败,并通知其他人停止操作。

You need a custom PartitionHandler for that. This is the piece that coordinates worker steps. The default behaviour is to wait for all workers steps to finish and aggregate the results before reporting to the main job. Your custom implementation should detect the failure of any worker step and inform others to stop.

此外,这种在所有工作程序中失败的情况下停止/失败所有工作程序的设计是不合适的重新启动作业。这意味着重新启动作业将重新启动所有分区,这并不是分区作业的目标,首先是应该仅重新启动失败的分区。

Moreover, this design of stopping/failing all workers if one of them fails is not appropriate for job restart. This means that restarting a job would restart all partitions, which is not the goal of a partitioned job in the first place where only failed partitions should be restarted.

这篇关于春季批处理:传播在分区步骤中遇到的异常(停止作业执行)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆