Spring Batch在集群环境中正确重启未完成的作业 [英] Spring Batch correctly restart uncompleted jobs in clustered environment

查看:237
本文介绍了Spring Batch在集群环境中正确重启未完成的作业的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用以下逻辑在单节点Spring Batch应用程序上重新启动未完成的作业:

I used the following logic to restart the uncompleted jobs on single-node Spring Batch application:

public void restartUncompletedJobs() {

    try {
        jobRegistry.register(new ReferenceJobFactory(documetPipelineJob));

        List<String> jobs = jobExplorer.getJobNames();
        for (String job : jobs) {
            Set<JobExecution> runningJobs = jobExplorer.findRunningJobExecutions(job);

            for (JobExecution runningJob : runningJobs) {
                runningJob.setStatus(BatchStatus.FAILED);
                runningJob.setEndTime(new Date());
                jobRepository.update(runningJob);
                jobOperator.restart(runningJob.getId());
            }
        }
    } catch (Exception e) {
        LOGGER.error(e.getMessage(), e);
    }
}

现在,我正在尝试使其在两节点群集上运行.每个节点上的两个应用程序都将指向共享的PostgreSQL数据库.

Right now I'm trying to make it working on the two-node cluster. Both of the application on every node will be pointed to the shared PostgreSQL database.

让我们考虑以下示例:我有2个作业实例-jobInstance1现在在node1上运行,而jobInstance2node2上运行.在执行jobInstance1时,由于某些原因Node1重新启动. node1重新启动后,spring batch应用程序尝试使用上述逻辑重新启动未完成的作业-它看到有2个未完成的作业实例-jobInstance1jobInstance2(已在node2上正确运行),并尝试重新启动它们.用这种方式代替重新启动唯一的jobInstance1-它会重新启动jobInstance1jobInstance2 ..但是jobInstance2不应重新启动,因为它现在可以正确地在node2上执行.

Let's consider the following example: I have 2 job instances - the jobInstance1 is running right now on node1 and the jobInstance2 is running on node2. Node1 is restarted for some reason during jobInstance1 execution. After node1 restart the spring batch application tries to restart the uncompleted jobs with a logic presented above - it sees that there are 2 uncompleted job instances - jobInstance1 and jobInstance2(which is correctly running on node2) and tries to restart both of them. This way instead to restart the only jobInstance1 - it will restart both jobInstance1 and jobInstance2.. but the jobInstance2 should not be restarted because it is correctly executing right now on node2.

如何在应用程序启动期间正确地重启未完成的作业(在上一个应用程序终止之前),并防止像jobInstance2这样的作业也将重新启动的情况?

How to correctly restart during the application startup the not completed jobs(before the previous application termination) and prevent the situation when the jobs like jobInstance2 will be also restarted?

已更新

这是下面答案中提供的解决方案:

This is the solution provided in the answer below:

Get the job instances of your job with JobOperator#getJobInstances

For each instance, check if there is a running execution using JobOperator#getExecutions.

2.1 If there is a running execution, move to next instance (in order to let the execution finish either successfully or with a failure)

2.2 If there is no currently running execution, check the status of the last execution and restart it if failed using JobOperator#restart.

我有一个关于#2.1的问题-在应用程序重新启动后,Spring Batch会自动以运行中的执行方式重新启动未完成的作业吗?

I have a question regarding #2.1 - will Spring Batch automatically restart uncompleted jobs with a running execution after application restart or do I need to do manual actions to do so?

推荐答案

您的逻辑不会重新启动未完成的作业.您的逻辑是采用当前正在运行的作业执行,将其状态设置为FAILED并重新启动它们.您的逻辑不应找到正在运行的执行,应查找当前正在运行的执行,尤其是失败的执行,然后重新启动.

Your logic is not restarting uncompleted jobs. Your logic is taking currently running job executions, setting their status to FAILED and restarting them. Your logic should not find running executions, it should look for not currently running executions, especially failed ones and restart them.

如何正确地重新启动失败的作业并防止像jobInstance2这样的作业也将重新启动的情况?

How to correctly restart the failed jobs and prevent the situation when the jobs like jobInstance2 will be also restarted?

在伪代码中,您需要执行以下操作:

In pseudo code, what you need to do to achieve this is:

  1. 通过JobOperator#getJobInstances
  2. 获取您的工作的工作实例
  3. 对于每个实例,请使用JobOperator#getExecutions检查是否正在运行执行.

  1. Get the job instances of your job with JobOperator#getJobInstances
  2. For each instance, check if there is a running execution using JobOperator#getExecutions.

2.1如果有正在运行的执行,请移至下一个实例(以使执行成功或失败完成)

2.1 If there is a running execution, move to next instance (in order to let the execution finish either successfully or with a failure)

2.2如果当前没有正在运行的执行,请检查上一次执行的状态,如果失败,则使用JobOperator#restart重新启动它.

2.2 If there is no currently running execution, check the status of the last execution and restart it if failed using JobOperator#restart.

在您的情况下:

  • jobInstance1应该在步骤2.2中重新启动
  • 应该在步骤2.1中过滤
  • jobInstance2,因为它在节点2上正在运行.
  • jobInstance1 should be restarted in step 2.2
  • jobInstance2 should be filtered in step 2.1 since there is a running execution for it on node 2.

这篇关于Spring Batch在集群环境中正确重启未完成的作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆