来自多个进程的Python日志记录 [英] Python logging from multiple processes

查看:95
本文介绍了来自多个进程的Python日志记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个可能运行时间长的程序,当前有4个进程,但可以配置为包含更多进程.我研究了从多个进程(使用python的logging),并且正在使用SocketHandler方法进行讨论,此处.使用单个记录器(没有套接字),我从来没有任何问题,但是从我的阅读中得知,它最终会出乎意料地失败.据我所知,当您尝试同时写入同一文件时会发生什么.我的代码基本上执行以下操作:

I have a possibly long running program that currently has 4 processes, but could be configured to have more. I have researched logging from multiple processes using python's logging and am using the SocketHandler approach discussed here. I never had any problems having a single logger (no sockets), but from what I read I was told it would fail eventually and unexpectedly. As far as I understand its unknown what will happen when you try to write to the same file at the same time. My code essentially does the following:

import logging
log = logging.getLogger(__name__)

def monitor(...):
    # Spawn child processes with os.fork()
    # os.wait() and act accordingly

def main():
    log_server_pid = os.fork()
    if log_server_pid == 0:
        # Create a LogRecordSocketServer (daemon)
        ...
        sys.exit(0)
    # Add SocketHandler to root logger
    ...
    monitor(<configuration stuff>)

if __name__ == "__main__":
    main()

所以我的问题是:是否需要在每个os.fork()之后创建一个新的log对象?现有的全局log对象会发生什么?

So my questions are: Do I need to create a new log object after each os.fork()? What happens to the existing global log object?

按照我的方式做事,我是否还能解决我想避免的问题(多个打开的文件/套接字)?这会失败吗,为什么会失败(我想知道将来类似的实现是否会失败)?

With doing things the way I am, am I even getting around the problem that I'm trying to avoid (multiple open files/sockets)? Will this fail and why will it fail (I'd like to be able to tell if future similar implementations will fail)?

此外,从多个进程登录到一个文件的普通"(一种log=表达式)方法会以何种方式失败?它会引发IOError/OSError吗?还是只是没有将数据完全写入文件中?

Also, in what way does the "normal" (one log= expression) method of logging to one file from multiple processes fail? Does it raise an IOError/OSError? Or does it just not completely write data to the file?

如果有人可以提供答案或链接来帮助我,那就太好了.谢谢.

If someone could provide an answer or links to help me out, that would be great. Thanks.

仅供参考: 我正在Mac OS X Lion上进行测试,并且该代码可能最终会在Windows计算机上的CentOS 6 VM上运行(如果有的话).无论使用哪种解决方案,都无需在Windows上运行,而应在基于Unix的系统上运行.

FYI: I am testing on Mac OS X Lion and the code will probably end up running on a CentOS 6 VM on a Windows machine (if that matters). Whatever solution I use does not need to work on Windows, but should work on a Unix based system.

更新:这个问题已经开始远离记录特定行为,并且更多地涉及Linux在派生期间如何处理文件描述符.我掏出我的一本大学教科书,看来如果您从两个进程(不是在fork之前)以追加模式打开文件,只要您的写操作不超过两个进程,它们都将能够正确地写入文件实际的内核缓冲区(尽管可能需要使用行缓冲,但仍不确定是否要使用该缓冲).这将创建2个文件表条目和1个v节点表条目.打开文件然后进行分叉是行不通的,但是只要您不超过以前的内核缓冲区(我已经在上一个程序中做到了),它似乎就可以了.

UPDATE: This question has started to move away from logging specific behavior and is more in the realm of what does linux do with file descriptors during forks. I pulled out one of my college textbooks and it seems that if you open a file in append mode from two processes (not before a fork) that they will both be able to write to the file properly as long as your write doesn't exceed the actual kernel buffer (although line buffering might need to be used, still not sure on that one). This creates 2 file table entries and one v-node table entry. Opening a file then forking isn't supposed to work, but it seems to as long as you don't exceed the kernel buffer as before (I've done it in a previous program).

所以我想,如果您想要独立于平台的多处理日志记录,则可以使用套接字,并在每个派生之后创建一个新的SocketHandler,以确保安全,如下面Vinay所建议的(应该在任何地方都有效).对我来说,由于我可以严格控制我的软件所运行的操作系统,因此我想使用一个带有FileHandler的全局log对象(默认情况下以附加模式打开,并且在大多数操作系统上以行缓冲) . open 的文档说:负缓冲意味着使用系统默认值,通常用于tty设备的行缓冲,而用于其他文件的完全缓冲.如果省略,则使用系统默认值."或者我可以创建自己的日志记录流以确保行缓冲.而且要明确一点,我可以:

So I guess, if you want platform independent multi-processing logging you use sockets and create a new SocketHandler after each fork to be safe as Vinay suggested below (that should work everywhere). For me, since I have strong control over what OS my software is being run on, I think I'm going to go with one global log object with a FileHandler (opens in append mode by default and line buffered on most OSs). The documentation for open says "A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used." or I could just create my own logging stream to be sure of line buffering. And just to be clear, I'm ok with:

# Process A
a_file.write("A\n")
a_file.write("A\n")
# Process B
a_file.write("B\n")

生产中...

A\n
B\n
A\n

只要不产生...

AB\n
\n
A\n

Vinay (或其他任何人),我怎么错了?让我知道.感谢您提供的更多清晰度/确定性.

Vinay(or anyone else), how wrong am I? Let me know. Thanks for any more clarity/sureness you can provide.

推荐答案

在每个os.fork()之后是否需要创建一个新的日志对象?现有的全局日志对象会发生什么?

Do I need to create a new log object after each os.fork()? What happens to the existing global log object?

AFAIK全局日志对象仍然指向父进程和子进程中的同一记录器.因此,您不需要创建一个新的.但是,我认为您应该在monitor()中的fork()之后创建并添加SocketHandler,以便套接字服务器具有四个不同的连接,每个子进程一个.如果不这样做,那么在monitor()中分叉的子进程将从其父级继承SocketHandler及其套接字句柄,但我不确定它是否会发生异常.该行为可能与操作系统有关,并且您可能在OSX上很幸运.

AFAIK the global log object remains pointing to the same logger in parent and child processes. So you shouldn't need to create a new one. However, I think you should create and add the SocketHandler after the fork() in monitor(), so that the socket server has four distinct connections, one to each child process. If you don't do this, then the child processes forked off in monitor() will inherit the SocketHandler and its socket handle from their parent, but I don't know for sure that it will misbehave. The behaviour is likely to be OS-dependent and you may be lucky on OSX.

按照我的方式做事,我是否还能解决我想避免的问题(多个打开的文件/套接字)?这会失败吗,为什么会失败(我想知道将来类似的实现是否会失败)?

With doing things the way I am, am I even getting around the problem that I'm trying to avoid (multiple open files/sockets)? Will this fail and why will it fail (I'd like to be able to tell if future similar implementations will fail)?

如果您按照我上面的建议在最后一个fork()之后创建到套接字服务器的套接字连接,我不会期望失败,但是我不确定在任何其他情况下该行为的定义是否正确.您引用了多个打开的文件,但是我看不到在您的伪代码段中打开文件的引用,只是打开套接字.

I wouldn't expect failure if you create the socket connection to the socket server after the last fork() as I suggested above, but I am not sure the behaviour is well-defined in any other case. You refer to multiple open files but I see no reference to opening files in your pseudocode snippet, just opening sockets.

此外,从多个进程登录到一个文件的正常"(一个log =表达式)方法会以何种方式失败?它会引发IOError/OSError吗?还是只是没有将数据完全写入文件中?

Also, in what way does the "normal" (one log= expression) method of logging to one file from multiple processes fail? Does it raise an IOError/OSError? Or does it just not completely write data to the file?

我认为该行为的定义不明确,但是人们希望故障模式显示为来自文件中不同进程的散布日志消息,例如

I think the behaviour is not well-defined, but one would expect failure modes to present as interspersed log messages from different processes in the file, e.g.

Process A writes first part of its message
Process B writes its message
Process A writes second part of its message

更新:如果您按照注释中的描述使用FileHandler,由于上述情况,情况将不太好:过程A和B都开始指向文件的末尾(由于追加模式),但是此后事情可能会不同步,因为(例如,在多处理器上,甚至可能在单处理器上),一个进程可以(抢占另一个进程)写入共享文件句柄在另一个过程完成之前.

Update: If you use a FileHandler in the way you described in your comment, things will not be so good, due to the scenario I've described above: process A and B both begin pointing at the end of file (because of the append mode), but thereafter things can get out of sync because (e.g. on a multiprocessor, but even potentially on a uniprocessor), one process can (preempt another and) write to the shared file handle before another process has finished doing so.

这篇关于来自多个进程的Python日志记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆