Spark Scala S3存储:权限被拒绝 [英] Spark Scala S3 storage: permission denied

查看：82 发布时间：2021/4/3 19:32:38 scala apache-spark hadoop amazon-s3

本文介绍了Spark Scala S3存储:权限被拒绝的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经在Internet上阅读了很多有关如何使Spark与S3一起工作的主题，但是仍然无法正常工作.我已经下载了:

I've read a lot of topic on Internet on how to get working Spark with S3 still there's nothing working properly. I've downloaded : Spark 2.3.2 with hadoop 2.7 and above.

我仅从Hadoop 2.7.7(与Spark/Hadoop版本匹配)中复制了一些库到Spark jars文件夹:

I've copied only some libraries from Hadoop 2.7.7 (which matches Spark/Hadoop version) to Spark jars folder:

hadoop-aws-2.7.7.jar
hadoop-auth-2.7.7.jar
aws-java-sdk-1.7.4.jar

仍然无法使用S3N或S3A来通过Spark读取我的文件:

Still I can't use nor S3N nor S3A to get my file read by spark:

对于S3A，我有以下例外情况:

For S3A I have this exception:

sc.hadoopConfiguration.set("fs.s3a.access.key","myaccesskey")
sc.hadoopConfiguration.set("fs.s3a.secret.key","mysecretkey")
val file = sc.textFile("s3a://my.domain:8080/test_bucket/test_file.txt")
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: AE203E7293ZZA3ED, AWS Error Code: null, AWS Error Message: Forbidden

使用此 Python片段，以及更多代码，我可以列出我的存储分区，列出我的文件，下载文件，从我的计算机中读取文件并获取文件url.这段代码为我提供了以下文件网址:

Using this piece of Python, and some more code, I can list my buckets, list my files, download files, read files from my computer and get file url. This code gives me the following file url:

https://my.domain:8080/test_bucket/test_file.txt?Signature =％2Fg3jv96Hdmq2450VTrl4M％2Be％2FI％3D& Expires = 1539595614& AWSAccessKeyId = myaccesskey

我应该如何安装/设置/下载才能从我的S3服务器读取和写入spark?

How should I install / set up / download to get spark able to read and write from my S3 server ?

修改3:

在评论中使用调试工具5l7yf"rel =" nofollow noreferrer>结果.
似乎问题出在签名上，不知道是什么意思.

Using debug tool in comment here's the result.
Seems like the issue is with a signature thing not sure what it means.

推荐答案

首先，您需要下载与spark-hadoop版本的安装相匹配的aws-hadoop.jar和aws-java-sdk.jar并将它们添加到spark文件夹中的 jars 文件夹.
然后，如果您的S3服务器不支持动态DNS，则需要调整将要使用的服务器并启用路径样式:

First you will need to download aws-hadoop.jar and aws-java-sdk.jar that matches the install of your spark-hadoop release and add them to the jars folder inside spark folder.
Then you will need to precise the server you will use and enable path style if your S3 server do not support dynamic DNS:

sc.hadoopConfiguration.set("fs.s3a.path.style.access","true")
sc.hadoopConfiguration.set("fs.s3a.endpoint","my.domain:8080")
#I had to change signature version because I have an old S3 api implementation:
sc.hadoopConfiguration.set("fs.s3a.signing-algorithm","S3SignerType")

这是我的最终代码:

sc.hadoopConfiguration.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
val tmp = sc.textFile("s3a://test_bucket/test_file.txt")
sc.hadoopConfiguration.set("fs.s3a.access.key","mykey")
sc.hadoopConfiguration.set("fs.s3a.secret.key","mysecret")
sc.hadoopConfiguration.set("fs.s3a.endpoint","my.domain:8080")
sc.hadoopConfiguration.set("fs.s3a.connection.ssl.enabled","true")
sc.hadoopConfiguration.set("fs.s3a.path.style.access","true")
sc.hadoopConfiguration.set("fs.s3a.signing-algorithm","S3SignerType")
tmp.count()

我建议将大多数设置放入 spark-defaults.conf :

I would recommand to put most of the settings inside spark-defaults.conf:

spark.hadoop.fs.s3a.impl                   org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.path.style.access      true
spark.hadoop.fs.s3a.endpoint               mydomain:8080
spark.hadoop.fs.s3a.connection.ssl.enabled true
spark.hadoop.fs.s3a.signing-algorithm      S3SignerType

我遇到的一个问题是将 spark.hadoop.fs.s3a.connection.timeout 设置为10，但是此值在Hadoop 3之前以毫秒为单位设置，这给了您一个非常重要的意义.超时时间长；尝试读取文件后1.5分钟，错误消息就会出现.

One of the issue I had has been to set spark.hadoop.fs.s3a.connection.timeout to 10 but this value is set in millisecond prior to Hadoop 3 and it gives you a very long timeout; error message would appear 1.5 minute after the attempt to read a file.

PS:
特别感谢 Steve Loughran .
非常感谢您的宝贵帮助.

PS:
Special thanks to Steve Loughran.
Thank you a lot for the precious help.

这篇关于Spark Scala S3存储:权限被拒绝的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark Scala S3存储:权限被拒绝 [英] Spark Scala S3 storage: permission denied

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark Scala S3存储:权限被拒绝 [英] Spark Scala S3 storage: permission denied

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭