Spark 正在发明他自己的 AWS secretKey [英] Spark is inventing his own AWS secretKey

查看:18
本文介绍了Spark 正在发明他自己的 AWS secretKey的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从 Spark 读取 s3 存储桶直到今天 Spark 总是抱怨请求返回 403

hadoopConf = spark_context._jsc.hadoopConfiguration()hadoopConf.set("fs.s3a.access.key", "ACCESSKEY")hadoopConf.set("fs.s3a.secret.key", "SECRETKEY")hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")logs = spark_context.textFile("s3a://mybucket/logs/*)

Spark 说......无效的访问密钥 [ACCESSKEY]

但是使用相同的 ACCESSKEY 和 SECRETKEY 这与 aws-cli 一起使用

aws s3 ls mybucket/logs/

在 python boto3 中这是有效的

resource = boto3.resource("s3", region_name="us-east-1")resource.Object("mybucket", "logs/text.py") .put(Body=open("text.py", "rb"),ContentType="text/x-py")

所以我的凭据无效,问题肯定出在 Spark 上..

今天我决定打开整个 Spark 的调试"日志,令我惊讶的是...... Spark 没有使用我提供的 [SECRETKEY] 而是......添加一个随机的???

17/03/08 10:40:04 DEBUG 请求:发送请求:HEAD https://mybucket.s3.amazonaws.com/标题:(授权:AWS ACCESSKEY:[RANDON-SECRET-KEY],用户代理:aws-sdk-java/1.7.4 Mac_OS_X/10.11.6Java_HotSpot(TM)_64-Bit_Server_VM/25.65-b01/1.8.0_65,日期:2017 年 3 月 8 日星期三 10:40:04 GMT,内容类型:application/x-www-form-urlencoded;charset=utf-8,)

这就是为什么它仍然返回403!Spark 没有使用我随 fs.s3a.secret.key 提供的密钥,而是发明了一个随机密钥??

为了记录,我使用这个命令在我的机器 (OSX) 上本地运行它

spark-submit --packages com.amazonaws:aws-java-sdk-pom:1.11.98,org.apache.hadoop:hadoop-aws:2.7.3 test.py

有人可以启发我吗?

解决方案

我遇到了类似的问题.使用有效 AWS 凭证的请求返回 403 Forbidden,但仅在某些机器上.最终我发现那些特定机器上的系统时间晚了 10 分钟.同步系统时钟解决了这个问题.

希望这有帮助!

I'm trying to read a s3 bucket from Spark and up until today Spark always complain that the request return 403

hadoopConf = spark_context._jsc.hadoopConfiguration()
hadoopConf.set("fs.s3a.access.key", "ACCESSKEY")
hadoopConf.set("fs.s3a.secret.key", "SECRETKEY")
hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
logs = spark_context.textFile("s3a://mybucket/logs/*)

Spark was saying .... Invalid Access key [ACCESSKEY]

However with the same ACCESSKEY and SECRETKEY this was working with aws-cli

aws s3 ls mybucket/logs/

and in python boto3 this was working

resource = boto3.resource("s3", region_name="us-east-1")
resource.Object("mybucket", "logs/text.py") 
            .put(Body=open("text.py", "rb"),ContentType="text/x-py")

so my credentials ARE invalid and the problem is definitely something with Spark..

Today I decided to turn on the "DEBUG" log for the entire spark and to my suprise... Spark is NOT using the [SECRETKEY] I have provided but instead... add a random one???

17/03/08 10:40:04 DEBUG request: Sending Request: HEAD https://mybucket.s3.amazonaws.com / Headers: (Authorization: AWS ACCESSKEY:[RANDON-SECRET-KEY], User-Agent: aws-sdk-java/1.7.4 Mac_OS_X/10.11.6 Java_HotSpot(TM)_64-Bit_Server_VM/25.65-b01/1.8.0_65, Date: Wed, 08 Mar 2017 10:40:04 GMT, Content-Type: application/x-www-form-urlencoded; charset=utf-8, )

This is why it still return 403! Spark is not using the key I provide with fs.s3a.secret.key but instead invent a random one??

For the record I'm running this locally on my machine (OSX) with this command

spark-submit --packages com.amazonaws:aws-java-sdk-pom:1.11.98,org.apache.hadoop:hadoop-aws:2.7.3 test.py

Could some one enlighten me on this?

解决方案

I ran into a similar issue. Requests that were using valid AWS credentials returned a 403 Forbidden, but only on certain machines. Eventually I found out that the system time on those particular machines were 10 minutes behind. Synchronizing the system clock solved the problem.

Hope this helps!

这篇关于Spark 正在发明他自己的 AWS secretKey的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆