如何在Python多处理中使用boto3客户端? [英] How to use boto3 client with Python multiprocessing?

查看:157
本文介绍了如何在Python多处理中使用boto3客户端?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

代码看起来像这样:

import multiprocessing as mp
from functools import partial

import boto3
import numpy as np


s3 = boto3.client('s3')

def _something(**kwargs):
    # Some mixed integer programming stuff related to the variable archive
    return np.array(some_variable_related_to_archive)


def do(s3):
    archive = np.load(s3.get_object('some_key')) # Simplified -- details not relevant
    pool = mp.pool()
    sub_process = partial(_something, slack=0.1)
    parts = np.array_split(archive, some_int)
    target_parts = np.array(things)

    out = pool.starmap(sub_process, [x for x in zip(parts, target_parts)] # Error occurs at this line

    pool.close()
    pool.join()

do(s3)

错误:

_pickle.PicklingError: Can't pickle <class 'botocore.client.S3'>: attribute lookup S3 on botocore.client failed

我对Python多处理库的经验非常有限.我不确定为什么,当S3客户端不是任何函数中的参数时,它会引发以上错误.请注意,如果存档文件是从磁盘而不是从S3加载的,则代码可以正常运行.

I have very limited experience with the Python multiprocessing library. I'm not sure why, when the S3 client is not a parameter in any of the functions, it throws the above error. Note that the code is able to run okay if the archive file is loaded from disk, and not from S3.

任何帮助/指导将不胜感激.

Any help/guidance would be greatly appreciated.

推荐答案

传递给mp.starmap()的对象必须可以腌制,而S3客户端不能腌制.将S3客户端的操作带到调用mp.starmap()的函数之外可以解决此问题:

Objects passed to mp.starmap() must be pickle-able, and S3 clients are not pickle-able. Bringing the actions of the S3 client outside of the function that calls mp.starmap() can solve the issue:

import multiprocessing as mp
from functools import partial

import boto3
import numpy as np


s3 = boto3.client('s3')
archive = np.load(s3.get_object('some_key')) # Simplified -- details not relevant # Move the s3 call here, outside of the do() function

def _something(**kwargs):
    # Some mixed integer programming stuff related to the variable archive
    return np.array(some_variable_related_to_archive)


def do(archive): # pass the previously loaded archive, and not the s3 object into the function
    pool = mp.pool()
    sub_process = partial(_something, slack=0.1)
    parts = np.array_split(archive, some_int)
    target_parts = np.array(things)

    out = pool.starmap(sub_process, [x for x in zip(parts, target_parts)] # Error occurs at this line

    pool.close()
    pool.join()

do(archive) # pass the previously loaded archive, and not the s3 object into the function

这篇关于如何在Python多处理中使用boto3客户端?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆