在Luigi Visualiser中坚持完成的管道 [英] Persist Completed Pipeline in Luigi Visualiser

查看:99
本文介绍了在Luigi Visualiser中坚持完成的管道的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开始将每晚的数据管道从可视化的ETL工具移植到Luigi,我真的很高兴能有一个可视化工具来查看作业的状态.但是,我注意到上一个作业(名为MasterEnd)完成后几分钟,除MasterEnd之外,所有节点都从图中消失了.这有点不方便,因为我希望看到当天/过去的日子都完成了.

I'm starting to port a nightly data pipeline from a visual ETL tool to Luigi, and I really enjoy that there is a visualiser to see the status of jobs. However, I've noticed that a few minutes after the last job (named MasterEnd) completes, all of the nodes disappear from the graph except for MasterEnd. This is a little inconvenient, as I'd like to see that everything is complete for the day/past days.

此外,如果我在可视化工具中直接转到上一个作业的URL,它将找不到它运行的任何历史记录:Couldn't find task MasterEnd(date=2015-09-17, base_url=http://aws.east.com/, log_dir=/home/ubuntu/logs/).我已经证实它今天早上成功运行了.

Further, if in the visualiser I go directly to the last job's URL, it can't find any history that it ran: Couldn't find task MasterEnd(date=2015-09-17, base_url=http://aws.east.com/, log_dir=/home/ubuntu/logs/). I have verified that it ran successfully this morning.

要注意的一件事是,我有一个cron,它每15分钟运行一次此管道以检查S3上的文件.如果存在,它将运行,否则它将停止.我不确定这是否导致从可视化工具中删除任务.我注意到它每次运行都会生成一个新的PID,但是我找不到一种在文档中每天保留一个PID的方法.

One thing to note is that I have a cron that runs this pipeline every 15 minutes to check for a file on S3. If it exists, it runs, otherwise it stops. I'm not sure if that is causing the removal of tasks from the visualiser or not. I've noticed it generates a new PID every run, but I couldn't find a way to persist one PID/day in the docs.

所以,我的问题是:是否可以在可视化器中保留当天完成的图形?有没有办法查看过去发生的事情?

So, my questions: Is it possible to persist the completed graph for the current day in the visualiser? And is there a way to see what has happened in the past?

感谢所有帮助

推荐答案

如果这是正确的话,我不是100%积极,但这是我将首先尝试的方法.调用luigi.run时,将其传递给--scheduler-remove-delay.我猜这是调度程序在所有依赖项完成后忘记任务之前要等待的时间.如果您浏览 luigi的来源,则默认值为600秒.例如:

I'm not 100% positive if this is correct, but this is what I would try first. When you call luigi.run, pass it --scheduler-remove-delay. I'm guessing this is how long the scheduler waits before forgetting a task after all of its dependents have completed. If you look through luigi's source, the default is 600 seconds. For example:

luigi.run(["--workers", "8", "--scheduler-remove-delay","86400")], main_task_cls=task_name)

这篇关于在Luigi Visualiser中坚持完成的管道的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆