Python池多处理功能 [英] Python Pool Multiprocessing with functions

查看:151
本文介绍了Python池多处理功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好吧,我一直在玩一些代码,以便更好地理解python,部分是为了从网上刮取一些数据。我想了解如果使用Python Multiprocessing和Pool的一部分。



我已经掌握了基础知识,但是因为我先编写了单线程过程,然后移动到使用池多线程的过程,我有两个全局变量,并调用全局定义的函数。我猜这些都是不好的,但在网上搜索,事情似乎变得非常快,或不回答我的问题。



任何人都可以确认首先全局变量是不好的,并且可能导致问题,对我来说这是有道理的,因为两个线程可以同时访问同一个变量,因此存在问题。

第二部分,如果我有一个全局定义的函数,为了参数处理一个字符串并使用标准字符串函数返回它,可以在池进程中调用它吗?

解决方案

多线程和多处理在涉及如何访问变量和函数方面有很大不同。单独的进程(多进程)具有不同的内存空间,因此无法访问相同的(实例)函数或变量,因此全局变量的概念并不存在。在进程之间共享数据必须通过可以传递数据的管道或队列来完成。主进程和子进程都可以访问相同的队列,所以您可以将其视为一种全局变量。



使用多线程你肯定可以访问全局变量,如果你的程序很简单,它可以成为编程的好方法。例如,子线程可能会读取主线程中变量的值,并将其用作子线程函数中的标志。但是,您需要了解线程安全操作。就像你说同一对象上多个线程的复杂操作可能会导致冲突。在这种情况下,您需要使用线程锁定或其他安全方法。然而,许多操作自然是原子的,因此是线程安全的,例如读取单个变量。有一个很好的线程安全操作列表和线程同步本页



通常在多处理和多线程中,您有一些耗时的函数传递给线程或进程,但它们不会重新运行该函数的同一个实例。以下示例显示了多线程原子访问全局变量的有效用例。

 导入多处理器为mp 
导入线程
导入多处理器时间

work_flag = True

def worker_func():
全局work_flag
而真:
如果work_flag:
#做的东西
time.sleep(1)
print mp.current_process()。name,'working,work_flag =',work_flag
else:
time.sleep(0.1)

def main():
全局work_flag

#进程不能访问work_flag的同一个实例!
process = mp.Process(target = worker_func)
process.daemon = True
process.start()

#线程可以安全地读取全局work_flag
线程= threading.Thread(target = worker_func)
thread.daemon = True
thread.start()

而真:
time.sleep(3)
#改变这个标志会停止线程,但不是进程
work_flag = False

if __name__ =='__main__':
main()


Okay I've been playing with some code partly to get a better understanding of python, and partly to scrape some data from the web. Part of what I want to learn about if using Python Multiprocessing and Pool.

I've got the basics working, however because I wrote the procedure single threaded first, and then moved to use pool to multi-thread the process, I have both global variables, and calls to globally defined functions. I'm guessing both of these are both bad, but searching the web, things seem to get very complicated very fast or don't answer my questions.

Can anybody confirm firstly that global variables are bad, and could lead to problems, to me this makes sense because two threads could access the same variable at the same time, hence problems.

Secondly, if I have a globally defined function, that for the sake of argument processes a string and returns it, using standard string functions, is it okay to call this from within the pool process?

解决方案

Multithreading and multiprocessing are quite different when it comes to how your variables and functions can be accessed. Separate processes (multiprocessing) have different memory spaces and therefore simply cannot access the same (instances of) functions or variables, so the concept of global variables doesn't really exist. Sharing data between processes has to be done via pipes or queues that can pass data for you. Both the main process and the child process can have access to the same queue though, so in a way you could think of that as a type of global variable.

With multithreading you can definitely access global variables and it can be a good way to program if your program is simple. For example, a child thread may read the value of a variable in the main thread and use it as a flag in the child thread's function. You need to be aware of threadsafe operations however; like you say complex operations by multiple threads on the same object can result in conflicts. In this case you need to use thread locking or some other safe method. However many operations are naturally atomic and therefore threadsafe, for instance reading a single variable. There's a good list of threadsafe operations and thread syncing on this page.

Generally with multiprocessing and multithreading you have some time consuming function that you pass to the thread or the process, but they won't be rerunning the same instance of that function. The below example shows a valid use case for multiple threads atomically accessing a global variable. The separate processes however won't be able to.

import multiprocessing as mp
import threading
import time

work_flag = True

def worker_func():
    global work_flag
    while True:
        if work_flag:
            # do stuff
            time.sleep(1)
            print mp.current_process().name, 'working, work_flag =', work_flag
        else:
            time.sleep(0.1)

def main():
    global work_flag

    # processes can't access the same "instance" of work_flag!
    process = mp.Process(target = worker_func)
    process.daemon = True
    process.start()

    # threads can safely read global work_flag
    thread = threading.Thread(target = worker_func)
    thread.daemon = True
    thread.start()

    while True:
        time.sleep(3)
        # changing this flag will stop the thread, but not the process
        work_flag = False

if __name__ == '__main__':
    main()

这篇关于Python池多处理功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆