Spark写入S3 V4 SignatureDoesNotMatch错误 [英] Spark Write to S3 V4 SignatureDoesNotMatch Error

查看：92 发布时间：2020/8/23 6:18:14 amazon-web-services apache-spark amazon-s3

本文介绍了Spark写入S3 V4 SignatureDoesNotMatch错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在尝试用Spark将数据帧写入S3时遇到了S3 SignatureDoesNotMatch.

I encounter S3 SignatureDoesNotMatch while trying to write Dataframe to S3 with Spark.

症状/事物尝试过:

该代码有时失败，但有时；
该代码可以从S3 读取，没有任何问题，并且可以不时写入S3，从而排除了错误的配置设置，例如 S3A/enableV4/错误的密钥/区域端点等
已根据S3文档 S3设置了S3A终结点端点;
确保AWS_SECRETY_KEY按照建议的此处;
使用NTP确保服务器时间是同步的；
以下内容已在EC2 m3.xlarge上进行了测试，并且spark-2.0.2-bin-hadoop2.7在本地模式下运行；
将文件写入本地fs时，问题不再存在；
现在的解决方法是用s3fs装载存储桶并写入其中.但是，这并不理想，因为s3fs经常因Spark承受的压力而死亡；

The code fail sometimes but works sometimes;
The code can read from S3 without any problem, and be able to write to S3 from time to time, which rules out wrong config settings like S3A / enableV4 / Wrong Key / Region Endpoint etc.
The S3A endpoint had been set according to the S3 docs S3 Endpoint;
Made sure the AWS_SECRETY_KEY does not contain any non-alphanumeric as per suggested here;
Made sure server time is in-sync by using NTP;
The following was tested on EC2 m3.xlarge with spark-2.0.2-bin-hadoop2.7 running on Local mode;
The issue is gone when the files are written to local fs;
right now the workaround was to mount the bucket with s3fs and write to there; however this is not ideal as s3fs dies quite often from the stress Spark put to it;

代码可以归结为:

spark-submit\
    --verbose\
    --conf spark.hadoop.fs.s3n.impl=org.apache.hadoop.fs.s3native.NativeS3FileSystem \
    --conf spark.hadoop.fs.s3.impl=org.apache.hadoop.fs.s3.S3FileSystem \
    --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem\
    --packages org.apache.hadoop:hadoop-aws:2.7.3\
    --driver-java-options '-Dcom.amazonaws.services.s3.enableV4'\
    foobar.py


# foobar.py
sc = SparkContext.getOrCreate()
sc._jsc.hadoopConfiguration().set("fs.s3a.access.key", 'xxx')
sc._jsc.hadoopConfiguration().set("fs.s3a.secret.key", 'xxx')
sc._jsc.hadoopConfiguration().set("fs.s3a.endpoint", 's3.dualstack.ap-southeast-2.amazonaws.com')

hc = SparkSession.builder.enableHiveSupport().getOrCreate()
dataframe = hc.read.parquet(in_file_path)

dataframe.write.csv(
    path=out_file_path,
    mode='overwrite',
    compression='gzip',
    sep=',',
    quote='"',
    escape='\\',
    escapeQuotes='true',
)

Spark泄漏了以下错误.

Spark spills the following error.

将log4j设置为详细，似乎发生了以下情况:

Set log4j to verbose, it appears the following had happened:

每个人都将被输出到S3 /_temporary/foorbar.part-xxx上的污点位置；
PUT调用会将分区移到最终位置；
几次成功的PUT调用后，所有后续的PUT调用由于403而失败；
由于reuqets是由aws-java-sdk制作的，因此不确定在应用程序级别上该怎么做； -以下日志来自另一个完全相同的错误事件；

Each individual will be output to staing location on S3 /_temporary/foorbar.part-xxx;
A PUT call will move the partitions into final location;
After a few successfully PUT calls, all the subsequent PUT call failed due to 403;
As the reuqets were made by aws-java-sdk, not sure what to do on application level; -- The following log were from another event with the exact same error;

 >> PUT XXX/part-r-00025-ae3d5235-932f-4b7d-ae55-b159d1c1343d.gz.parquet HTTP/1.1
 >> Host: XXX.s3-ap-southeast-2.amazonaws.com
 >> x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
 >> X-Amz-Date: 20161104T005749Z
 >> x-amz-metadata-directive: REPLACE
 >> Connection: close
 >> User-Agent: aws-sdk-java/1.10.11 Linux/3.13.0-100-generic OpenJDK_64-Bit_Server_VM/25.91-b14/1.8.0_91 com.amazonaws.services.s3.transfer.TransferManager/1.10.11
 >> x-amz-server-side-encryption-aws-kms-key-id: 5f88a222-715c-4a46-a64c-9323d2d9418c
 >> x-amz-server-side-encryption: aws:kms
 >> x-amz-copy-source: /XXX/_temporary/0/task_201611040057_0001_m_000025/part-r-00025-ae3d5235-932f-4b7d-ae55-b159d1c1343d.gz.parquet
 >> Accept-Ranges: bytes
 >> Authorization: AWS4-HMAC-SHA256 Credential=AKIAJZCSOJPB5VX2B6NA/20161104/ap-southeast-2/s3/aws4_request, SignedHeaders=accept-ranges;connection;content-length;content-type;etag;host;last-modified;user-agent;x-amz-content-sha256;x-amz-copy-source;x-amz-date;x-amz-metadata-directive;x-amz-server-side-encryption;x-amz-server-side-encryption-aws-kms-key-id, Signature=48e5fe2f9e771dc07a9c98c7fd98972a99b53bfad3b653151f2fcba67cff2f8d
 >> ETag: 31436915380783143f00299ca6c09253
 >> Content-Type: application/octet-stream
 >> Content-Length: 0
DEBUG wire:  << "HTTP/1.1 403 Forbidden[\r][\n]"
DEBUG wire:  << "x-amz-request-id: 849F990DDC1F3684[\r][\n]"
DEBUG wire:  << "x-amz-id-2: 6y16TuQeV7CDrXs5s7eHwhrpa1Ymf5zX3IrSuogAqz9N+UN2XdYGL2FCmveqKM2jpGiaek5rUkM=[\r][\n]"
DEBUG wire:  << "Content-Type: application/xml[\r][\n]"
DEBUG wire:  << "Transfer-Encoding: chunked[\r][\n]"
DEBUG wire:  << "Date: Fri, 04 Nov 2016 00:57:48 GMT[\r][\n]"
DEBUG wire:  << "Server: AmazonS3[\r][\n]"
DEBUG wire:  << "Connection: close[\r][\n]"
DEBUG wire:  << "[\r][\n]"
DEBUG DefaultClientConnection: Receiving response: HTTP/1.1 403 Forbidden
 << HTTP/1.1 403 Forbidden
 << x-amz-request-id: 849F990DDC1F3684
 << x-amz-id-2: 6y16TuQeV7CDrXs5s7eHwhrpa1Ymf5zX3IrSuogAqz9N+UN2XdYGL2FCmveqKM2jpGiaek5rUkM=
 << Content-Type: application/xml
 << Transfer-Encoding: chunked
 << Date: Fri, 04 Nov 2016 00:57:48 GMT
 << Server: AmazonS3
 << Connection: close
DEBUG requestId: x-amzn-RequestId: not available

Spark写入S3 V4 SignatureDoesNotMatch错误 [英] Spark Write to S3 V4 SignatureDoesNotMatch Error

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark写入S3 V4 SignatureDoesNotMatch错误 [英] Spark Write to S3 V4 SignatureDoesNotMatch Error

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭