Python日志记录会导致延迟吗? [英] Python logging causing latencies?

查看:83
本文介绍了Python日志记录会导致延迟吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Python3+bottle/UWSGI 开发实时 REST API.我在代码中遇到了延迟,有时是100毫秒,这在我的应用中确实很重要.

I'm developing a real-time REST API using Python3+bottle/UWSGI. I was experiencing latencies in my code, sometimes 100s of ms, which does matter in my app.

使用 logging 模块,我试图确定代码的慢速部分,并打印各个代码块的运行时间.我知道这是对代码进行性能分析的一种非常糟糕的方法,但是有时它可以很好地完成工作.

Using logging module, I was trying to identify slow parts of my code, printing how long individual codeblocks took to run. I know this is an extremely poor way of profiling the code, but sometimes it is able to do the job quite well.

即使我发现了一些较慢的部分,但我仍然缺少一些东西-各个部分似乎要花费10毫秒,但通常它们要花费100毫秒.经过一些疯狂的实验使我几乎完全发疯,我得出以下结论:

Even though I identified some slow parts, I was still missing something -- the individual parts seemed to take 10s of ms, but very often they took 100s of ms as the whole. After some increasingly insane experiments which drove me almost completely nuts, I've come to the following:

t = round(100*time.time())
logging.info('[%s] Foo' % t)
logging.info('[%s] Bar' % t)

令人惊讶的是,它给出:

Surprisingly, it gives:

2014-07-16 23:21:23,531  [140554568353] Foo
2014-07-16 23:21:24,312  [140554568353] Bar

尽管这似乎难以置信,但仍有两个随后的 logging.info()调用,由于某种原因,它们之间的间隔几乎为800毫秒.谁能告诉我发生了什么事?值得注意的是,如果有多个 info() 调用,则延迟在整个 API 方法中仅出现一次,最常见的是在其一开始(第一次调用之后).我唯一的假设是磁盘延迟,有几个(但不是那么多!)工人同时运行,而我使用的是旋转磁盘,没有SSD.但是我认为这里有缓冲区,并且操作系统将为我异步进行磁盘刷新.我的假设我错了吗?我应该避免完全记录以避免延迟吗?

Even though this seems hard to believe, there are two consequent logging.info() calls, and for some reason, there is a gap of almost 800 ms between them. Can anybody tell me what is going on? It is noteworthy that if there are multiple info() calls, the latency only appears once in the whole API method, most frequently at its very beginning (after first call). My only hypothesis are disk latencies, there are several (but not so many!) workers running simultaneously and I'm using rotational disk, no SSD. But I thought there are buffers and that the OS will do the disk-flush asynchronously for me. Am I wrong in my assumptions? Should I avoid logging completely to avoid the latencies?

编辑

根据Vinay Sajip的建议,我切换到以下初始化代码:

Based on Vinay Sajip's suggestion, I switched to the following initialization code:

log_que = queue.Queue(-1)
queue_handler = logging.handlers.QueueHandler(log_que)
log_handler = logging.StreamHandler()
queue_listener = logging.handlers.QueueListener(log_que, log_handler)
queue_listener.start()
logging.basicConfig(level=logging.DEBUG, format="%(asctime)s  %(message)s", handlers=[queue_handler])

似乎工作正常(如果我注释 queue_listener.start(),则没有输出),但仍会出现完全相同的延迟.我什至不知道怎么可能,该呼叫应该是非阻塞的.我还将 gc.collect()放在每个请求的末尾,以确保问题不是由垃圾收集器引起的-没有任何效果.我还尝试了一整天关闭日志记录.延迟消失了,所以我认为它们的来源必须在 logging 模块中...

It seems like it works fine (if I comment queue_listener.start(), there is no output), but the very same latencies still appear. I don't see how is it even possible, the call should be non-blocking. I also put gc.collect() at the end of each request to ensure the problem is not caused by the garbage collector -- without any effect. I've also tried to turn off the logging for the whole day. The latencies disappeared, so I think their source must be in the logging module...

推荐答案

尝试Logbook提供的异步日志记录

正如 hasan 所提议的,异步日志处理程序可能是要走的路.

Try async logging provided by Logbook

As hasan has proposed, async log handler can be the way to go.

最近我尝试使用 Logbook ,并且可以说,它将为您提供所需的一切-

Recently I have tried using Logbook and can say, it will provide you all what you need for this - ZeroMQHandler as well as ZeroMQSubscriber

这篇关于Python日志记录会导致延迟吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆