如何重设luigi任务状态? [英] How to reset luigi task status?

查看:121
本文介绍了如何重设luigi任务状态?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当前,我有一堆luigi任务排队在一起,并带有一个简单的依赖链(a -> b -> c -> d). d首先执行,最后a执行. a是要触发的任务.

Currently, I have a bunch of luigi tasks queued together, with a simple dependency chain( a -> b -> c -> d). d gets executed first, and a at the end. a is the task that gets triggered.

a之外的所有目标均返回luigi.LocalTarget()对象,并具有单个通用luigi.Parameter(),它是字符串(包含日期和时间).在luigi中央服务器(已启用历史记录)上运行.

All the targets except a return a luigi.LocalTarget() object and have a single generic luigi.Parameter() which is a string (containing a date and a time). Runs on a luigi central server (which has history enabled).

问题是,当我重新运行所述任务a时,luigi检查历史记录,并查看该特定任务是否之前已经运行过,如果它的状态为DONE,则不会运行任务( d(在这种情况下),而我无法做到这一点,更改字符串无济于事(向其添加了随机的微秒).如何强制执行任务?

The problem is that, when I rerun the said task a, luigi checks the history and sees if that particular task has been run before, if it had had a status of DONE, it doesn't run the tasks (d in this case) and I can't have that, changing the string isn't helping (added a random microsecond to it). How do I force run a task ?

推荐答案

首先说明:Luigi任务是幂等的.如果您使用相同的参数值运行任务,则无论您运行多少次,它都必须始终返回相同的输出.因此,多次运行它没有意义.这使Luigi变得功能强大:如果您有一项繁重的任务,需要花很多时间才能完成很多事情,而某个地方却失败了,那么您就必须从头开始重新运行它.如果将其拆分为较小的任务,然后运行并失败,则只需运行管道中的其余任务即可.

First a comment: Luigi tasks are idempotent. if you run a task with the same parameter values, no matter how many times you run it, it must always return the same outputs. So it doesn't make sense to run it more than once. This makes Luigi powerful: if you have a big task that makes a lot of things an takes a lot of time and it fails somewhere, you'll have to run it again from the beginning. If you split it into smaller tasks, run it and it fails, you'll only have to run the rest of the tasks in the pipeline.

运行任务时,Luigi检出该任务的输出以查看它们是否存在.如果不是,Luigi会检查出它所依赖的任务的输出.如果存在,则它将仅运行当前任务并生成输出Target.如果依赖项输出不存在,则它将运行该任务.

When you run a task Luigi checks out the outputs of that task to see if they exist. If they don't, Luigi checks out the outputs of the tasks it depends on. If they exists, then it will only run the current task and generate the output Target. If the dependencies outputs doesn't exists, then it will run that tasks.

因此,如果要重新运行任务,则必须删除其Target输出.而且,如果要重新运行整个管道,则必须删除级联任务所依赖的所有任务的所有输出.

So, if you want to rerun a task you must delete its Target outputs. And if you want to rerun the whole pipeline you must delete all the outputs of all the tasks that tasks depends on in cascade.

Luigi存储库中有一个此问题正在进行的讨论.看看此评论,因为它将为您提供一些脚本获取给定任务的输出目标并将其删除.

There's an ongoing discussion in this issue in Luigi repository. Take a look at this comment since it will point you to some scripts for getting the output targets of a given task and removing them.

这篇关于如何重设luigi任务状态?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆