如何在python中对stdout进行原子写入? [英] How can I do an atomic write to stdout in python?

查看:62
本文介绍了如何在python中对stdout进行原子写入?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在某些资料中已经读到 print 命令不是线程安全的,解决方法是改用 sys.stdout.write 命令,但仍然可以不适用于我,并且写 STDOUT 并不是原子操作.

I've read in some sources that the print command is not thread-safe and the workaround is to use sys.stdout.write command instead, but still it doesn't work for me and the writing to the STDOUT isn't atomic.

这是一个简短的示例(称为此文件parallelExperiment.py):

Here's a short example (called this file parallelExperiment.py):

   import os
   import sys
   from multiprocessing import Pool

   def output(msg):
    msg = '%s%s' % (msg, os.linesep)
    sys.stdout.write(msg)

   def func(input):
    output(u'pid:%d got input \"%s\"' % (os.getpid(), str(input)))

   def executeFunctionInParallel(funcName, inputsList, maxParallelism):
       output(u'Executing function %s on input of size %d with maximum parallelism of %d' % (
           funcName.__name__, len(inputsList), maxParallelism))
       parallelismPool = Pool(processes=maxParallelism)
       executeBooleanResultsList = parallelismPool.map(funcName, inputsList)
       parallelismPool.close()
       output(u'Function %s executed on input of size %d  with maximum parallelism of %d' % (
           funcName.__name__, len(inputsList), maxParallelism))
       # if all parallel executions executed well - the boolean results list should all be True
       return all(executeBooleanResultsList)

   if __name__ == "__main__":
    inputsList=[str(i) for i in range(20)]
    executeFunctionInParallel(func, inputsList, 4)

查看输出:

i.调用 python parallelExperiment.py 的输出(请注意,"pid"一词在某些行中被弄乱了):

i. Output of calling python parallelExperiment.py (pay attention that the word "pid" is messed up in some lines) :

Executing function func on input of size 20 with maximum parallelism of 4
ppid:2240 got input "0"
id:4960 got input "2"
pid:4716 got input "4"
pid:4324 got input "6"
ppid:2240 got input "1"
id:4960 got input "3"
pid:4716 got input "5"
pid:4324 got input "7"
ppid:4960 got input "8"
id:2240 got input "10"
pid:4716 got input "12"
pid:4324 got input "14"
ppid:4960 got input "9"
id:2240 got input "11"
pid:4716 got input "13"
pid:4324 got input "15"
ppid:4960 got input "16"
id:2240 got input "18"
ppid:2240 got input "19"
id:4960 got input "17"
Function func executed on input of size 20  with maximum parallelism of 4

ii.调用 python parallelExperiment.py> parallelExperiment.log 的输出,这意味着将 stdout 重定向到 parallelExperiment.log 文件(请注意顺序为行是不好的,因为在调用 executeFunctionInParallel 并同时调用 func 的前后,应该打印一条消息):

ii. Output of calling python parallelExperiment.py > parallelExperiment.log, meaning redirect the stdout to the parallelExperiment.log file (pay attention that the order of lines isn't good, because before and after calling to executeFunctionInParallel that calls func in parallel, a message should be printed) :

pid:3244 got input "4"
pid:3244 got input "5"
pid:3244 got input "12"
pid:3244 got input "13"
pid:240 got input "0"
pid:240 got input "1"
pid:240 got input "8"
pid:240 got input "9"
pid:240 got input "16"
pid:240 got input "17"
pid:1268 got input "2"
pid:1268 got input "3"
pid:1268 got input "10"
pid:1268 got input "11"
pid:1268 got input "18"
pid:1268 got input "19"
pid:3332 got input "6"
pid:3332 got input "7"
pid:3332 got input "14"
pid:3332 got input "15"
Executing function func on input of size 20 with maximum parallelism of 4
Function func executed on input of size 20  with maximum parallelism of 4

推荐答案

发生这种情况是因为multiprocessing.Pool实际上使用了子进程而不是线程. 您需要在进程之间使用显式同步.请注意,链接上的示例可以解决您的问题.

That happen because of multiprocessing.Pool actually uses subprocesses instead of threads. You need to use explicit synchronization between processes. Note, the example on the link, it solves your issue.

import os
import sys
from multiprocessing import Pool, Lock

lock = Lock()

def output(msg):
    msg = '%s%s' % (msg, os.linesep)
    with lock:
        sys.stdout.write(msg)

def func(input):
    output(u'pid:%d got input \"%s\"' % (os.getpid(), str(input)))

def executeFunctionInParallel(funcName, inputsList, maxParallelism):
    output(u'Executing function %s on input of size %d with maximum parallelism of %d' % (
      funcName.__name__, len(inputsList), maxParallelism))
    parallelismPool = Pool(processes=maxParallelism)
    executeBooleanResultsList = parallelismPool.map(funcName, inputsList)
    parallelismPool.close()
    parallelismPool.join()
    output(u'Function %s executed on input of size %d  with maximum parallelism of %d' % (
       funcName.__name__, len(inputsList), maxParallelism))
    # if all parallel executions executed well - the boolean results list should all be True
    return all(executeBooleanResultsList)

if __name__ == "__main__":
    inputsList=[str(i) for i in range(20)]
    executeFunctionInParallel(func, inputsList, 4)

这篇关于如何在python中对stdout进行原子写入?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆