使用 Prometheus 监控短暂的 Python 批处理作业进程 [英] Monitoring short lived python Batch Job Processes using Prometheus

查看:173
本文介绍了使用 Prometheus 监控短暂的 Python 批处理作业进程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用 Prometheus 监控我的 Python 进程(比如一些由 Cron 守护进程定期触发的脚本)?

How can I monitor my python processes (say some script that gets triggered periodically by Cron daemon) using Prometheus?

请注意,这不是 Web 应用程序,而是由 Cron 守护程序定期启动的短期进程.这个脚本出现,完成它的工作,然后终止.cron 守护进程每天多次启动相同的 python 脚本(大约 10 万次).我想从该脚本的各种运行中捕获多个统计信息(例如,运行特定函数所需的时间、消耗的 CPU 和内存量等)

Note that this is not a web application but a short-lived process that gets launched periodically by the Cron daemon. This script comes up, does its job, and terminates. The same python script gets launched multiple times a day (approximately 100k times) by cron daemon. I want to capture multiple stats from various runs of this script (for example, the time it takes to run a particular function, how much CPU and memory it consumes, etc.)

推荐答案

你可能想看看 Prometheus 的 Pushgateway:每当您的脚本完成时,它可以推送它收集的指标(例如,您的函数调用所用时间的直方图、总 CPU 利用率、峰值内存利用率等).

You may want to look at Prometheus' Pushgateway: whenever your script completes, it can push the metrics it collected (e.g. a histogram of how long your function calls took, total CPU utilization, peak memory utilization etc.).

您似乎是说您的脚本大约每秒运行一次.我希望这意味着每 300 个租户每 5 分钟一次".在这种情况下,您可以使用类似于 tenant_id 标签的内容推送您的指标,并能够查看每个租户或汇总的指标.

You seem to be saying your script will run approximately once a second. I am hoping that means something along the lines of "once every 5 minutes for each of 300 tenants". In a case like this, you would push your metrics with something like a tenant_id label and be able to see either per-tenant or aggregated metrics.

如果您的脚本以相同的参数/配置每秒运行一次,那么您可能会丢失一些指标,因为多个脚本可能会在同一秒内终止,所有脚本都会推送它们的指标,并且只会收集最后一个的指标作者:Prometheus(因为我相信你不能在 Prometheus 中设置低于 1 秒的收集间隔).

If your script runs once a second with the same parameters/configuration, then you'll probably lose some of the metrics because multiple scripts may terminate within the same second, all push their metrics and only the last one's metrics will get collected by Prometheus (as I believe you can't set a collection interval lower than 1 second in Prometheus).

这篇关于使用 Prometheus 监控短暂的 Python 批处理作业进程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆