异步和并行发生器 [英] Asynchronous and PARALLEL generator

查看:96
本文介绍了异步和并行发生器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有 python 脚本,它可以懒惰地收集数据、创建训练样本并将其传递给我的 ML 模型进行学习.现在我正在使用标准 python 生成器生成数据,据我所知,它是同步的.我正在寻找一种智能干净的方法来使我的生成器真正异步,因此当我将它用作迭代器时,在我拉出最后一个样本后,下一个数据样本的处理将立即开始.考虑以下示例:

I have python script which lazily collects data, creates training samples and passes it to my ML model for learning. For now I am generating data using standard python generator, which to my knowledge is synchronous. I am looking for a smart clean way to make my generator truly asynchronous so when I use it as iterator, the processing of next data samples will start immediately after I pull last samples out. Consider the following example:

def asyncgen():
    for i in range(5):
        print("I want this part to work asynchronously :(")
        i = 0;
        while(i<1e8):
            i+=1
        yield "Hi"

a = asyncgen()
for w in a:
    print(w)
    i = 0
    while (i < 1e8):
        i += 1

如何让我的生成器在收到嗨"后立即开始处理内容(并且在不同的进程下异步)?目前只有在 for 循环调用 next() 后才开始处理.

How do I make my generator to start processing stuff (and asynchronously, under different process) right after I receive "Hi"? Currently the processing starts only after the for cycle calls next().

我一直在研究 异步生成器 PEP 525,但它们似乎只能同时工作而不是并行工作(该死的 GIL!).在 Python 中,有什么好的、最好是原生的方式来做到这一点.

I have been looking into Asynchronous generators PEP 525, but they seem to work only concurrently and not in parallel (damn you GIL!). What is some nice, preferably native way to do this in Python.

推荐答案

绕过 GIL 的唯一方法是使用 多处理.

The only way to bypass the GIL is by using multiprocessing.

from multiprocessing import Process

def asynch_part(i):
    print("I want this part to work asynchronously :(")
    k = 0;
    while(k<1e8):
        k+=1
    yield "Hi" # +" from " + str(i)

if __name__ == '__main__':
    p=[]
    for i in range(5): # I am keeping the processes listed and trackable,  
                       # perhaps you do not care. os.getpid() tracks them anyway
        p[i] = Process(target=asynch_part, args=(i))
        p[i].start()

    for i in range(5):
        p[i].join()

因此,在上面的代码中,您的 asyncgen 作为并行进程独立运行 5 次.然后他们在程序结束前加入.保留列表 p 只是说明性的.

So in the above code your asyncgen is ran 5 times independently, as parallel processes. Then they they are joined before the program ends. Keeping a list p is just illustrative.

这篇关于异步和并行发生器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆