boto3无法在pyspark worker上创建客户端? [英] boto3 cannot create client on pyspark worker?

查看：81 发布时间：2020/9/22 23:46:25 python pyspark boto3

本文介绍了boto3无法在pyspark worker上创建客户端?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用boto3与AWS进行通信，将数据从Pyspark RDD的工作人员发送到SQS队列.我需要直接从分区发送数据，而不是收集RDD并从驱动程序发送数据.

I'm trying to send data from the workers of a Pyspark RDD to an SQS queue, using boto3 to talk with AWS. I need to send data directly from the partitions, rather than collecting the RDD and sending data from the driver.

我能够通过本地的boto3将邮件发送到SQS.从Spark驱动程序;另外，我可以导入boto3并在分区上创建boto3会话.但是，当我尝试从分区创建客户端或资源时，会收到错误消息.我相信boto3无法正确创建客户端，但是我不确定那一点.我的代码如下:

I am able to send messages to SQS via boto3 locally & from the Spark driver; also, I can import boto3 and create a boto3 session on the partitions. However when I try to create a client or resource from the partitions I receive an error. I believe boto3 is not correctly creating a client, but I'm not entirely sure on that point. My code looks like this:

def get_client(x):   #the x is required to use pyspark's mapPartitions
    import boto3
    client = boto3.client('sqs', region_name="us-east-1", aws_access_key_id="myaccesskey", aws_secret_access_key="mysecretaccesskey")
    return x

rdd_with_client = rdd.mapPartitions(get_client)

错误:

DataNotFoundError: Unable to load data for: endpoints

更长的回溯:

File "<stdin>", line 4, in get_client
  File "./rebuilt.zip/boto3/session.py", line 250, in client
    aws_session_token=aws_session_token, config=config)
  File "./rebuilt.zip/botocore/session.py", line 810, in create_client
    endpoint_resolver = self.get_component('endpoint_resolver')
  File "./rebuilt.zip/botocore/session.py", line 691, in get_component
    return self._components.get_component(name)
  File "./rebuilt.zip/botocore/session.py", line 872, in get_component
    self._components[name] = factory()
  File "./rebuilt.zip/botocore/session.py", line 184, in create_default_resolver
    endpoints = loader.load_data('endpoints')
  File "./rebuilt.zip/botocore/loaders.py", line 123, in _wrapper
    data = func(self, *args, **kwargs)
  File "./rebuilt.zip/botocore/loaders.py", line 382, in load_data
    raise DataNotFoundError(data_path=name)
DataNotFoundError: Unable to load data for: endpoints

我还尝试修改函数以创建资源而不是显式客户端，以查看它是否可以找到&使用默认的客户端设置.在这种情况下，我的代码是:

I've also tried modifying my function to create a resource instead of the explicit client, to see if it could find & use the default client setup. In that case, my code is:

def get_resource(x):
    import boto3
    sqs = boto3.resource('sqs', region_name="us-east-1", aws_access_key_id="myaccesskey", aws_secret_access_key="mysecretaccesskey")
    return x

rdd_with_client = rdd.mapPartitions(get_resource)

我收到一个指向has_low_level_client参数的错误，该错误是由于客户端不存在而触发的；追溯说:

I receive an error pointing to a has_low_level_client parameter, which is triggered because the client doesn't exist; the traceback says:

File "/usr/lib/spark/python/pyspark/rdd.py", line 2253, in pipeline_func
  File "/usr/lib/spark/python/pyspark/rdd.py", line 270, in func
  File "/usr/lib/spark/python/pyspark/rdd.py", line 689, in func
  File "<stdin>", line 4, in session_resource
  File "./rebuilt.zip/boto3/session.py", line 329, in resource
    has_low_level_client)
ResourceNotExistsError: The 'sqs' resource does not exist.
The available resources are:
   -

无可用资源，因为我认为没有客户可以容纳它们.

No resources available because, I think, there's no client to house them.

几天来，我一直在反对这个问题.任何帮助表示赞赏！

I've been banging my head against this one for a few days now. Any help appreciated!

boto3无法在pyspark worker上创建客户端? [英] boto3 cannot create client on pyspark worker?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

boto3无法在pyspark worker上创建客户端? [英] boto3 cannot create client on pyspark worker?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭