每个任务名称的Celery时间统计 [英] Celery time statistics per-task-name

查看:220
本文介绍了每个任务名称的Celery时间统计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些相当繁忙的芹菜队列,但不确定哪些任务有问题。有没有一种方法可以汇总结果以找出哪些任务需要花费很长时间?我在2-4台服务器上有10到20名工作人员。

I have some fairly busy celery queues, but not sure which tasks are the problematic ones. Is there a way to aggregate results to figure out which tasks are taking a long time? I have 10-20 workers on 2-4 servers.

使用redis作为代理,也将后端用作结果。我注意到Flower上的队列很忙,但无法弄清楚如何获取每个任务的时间统计信息。

Using redis as the broker and as the result backend as well. I noticed the busy queues on Flower, but can't figure out how to get time statistic aggregated per task.

推荐答案

方法1:

如果您在启动celery worker时启用了日志记录,则他们将记录每个任务花费的时间。

If you have enabled logging when celery workers are started, they log time taken for each task.

$ celery worker -l info -A your_app --logfile celery.log

这将生成类似这样的日志

This will generate logs like this

[2016-06-04 13:21:30,749: INFO/MainProcess] Task sig.add[a8b648eb-9674-44f0-90bd-71cfebe22f2f] succeeded in 0.00979363399983s: 3
[2016-06-04 13:21:30,973: INFO/MainProcess] Received task: sig.add[7fd422e6-8f48-4dd2-90de-e213afbedc38]
[2016-06-04 13:21:30,982: WARNING/Worker-2] called by small_task. LOL {'signal': <Signal: Signal>, 'result': 3, 'sender': <@task: sig.add of tasks:0x7fdf33146c50>}

您可以过滤在中成功完成的行。使用 [作为分隔符来分割这些行,打印任务名称和每个任务花费的时间,然后对所有行进行排序。

You can filter lines which have succeeded in. Split these lines using , [, : as delimiters, print task name and time taken by each of it and then sort all the lines.

$ grep ' succeeded in ' celery.log  | awk -F'[ :\[]' '{print $9, $13}' | sort 
awk: warning: escape sequence `\[' treated as plain `['
sig.add 0.00775764500031s
sig.add 0.00802627899975s
sig.foo 12.00813863099938s
sig.foo 15.00871706100043s
sig.foo 12.00979363399983s

如您所见添加非常快& foo 很慢。

As you can see add is very fast & foo is slow.

方法2:

Celery具有 task_prerun_handler task_postrun_handler 信号,它们在任务之前/之后运行。您可以连接功能,以跟踪时间,然后在某处记下时间。

Celery has task_prerun_handler,task_postrun_handler signals which run before/after task. You can hookup functions which will track time and then note the time somewhere.

from time import time
from celery.signals import task_prerun, task_postrun


tasks = {}
task_avg_time = {}
Average = namedtuple('Average', 'cum_avg count')


@task_prerun.connect
def task_prerun_handler(signal, sender, task_id, task, args, kwargs):
    tasks[task_id] = time()


@task_postrun.connect
def task_postrun_handler(signal, sender, task_id, task, args, kwargs, retval, state):
    try:
        cost = time() - tasks.pop(task_id)
    except KeyError:
        cost = None

    if not cost:
        return

    try:
        cum_avg, count = task_avg_time[task.name]
        new_count = count + 1
        new_avg = ((cum_avg * count) + cost) / new_count
        task_avg_time[task.name] = Average(new_avg, new_count)
    except KeyError:
        task_avg_time[task.name] = Average(cost, 1)

    # write to redis: task_avg_time

参考文献: https://stackoverflow.com/a/31731622/2698552

这篇关于每个任务名称的Celery时间统计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆