Spark正在发明自己的AWS SecretKey [英] Spark is inventing his own AWS secretKey

查看:116
本文介绍了Spark正在发明自己的AWS SecretKey的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从Spark读取s3存储桶,直到今天,Spark始终抱怨该请求返回403

hadoopConf = spark_context._jsc.hadoopConfiguration()
hadoopConf.set("fs.s3a.access.key", "ACCESSKEY")
hadoopConf.set("fs.s3a.secret.key", "SECRETKEY")
hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
logs = spark_context.textFile("s3a://mybucket/logs/*)

Spark在说....无效的访问密钥[ACCESSKEY]

但是,使用相同的ACCESSKEY和SECRETKEY可以与aws-cli一起使用

aws s3 ls mybucket/logs/

并且在python boto3中这是有效的

resource = boto3.resource("s3", region_name="us-east-1")
resource.Object("mybucket", "logs/text.py") \
            .put(Body=open("text.py", "rb"),ContentType="text/x-py")

所以我的凭证无效,并且肯定是Spark的问题.

今天我决定打开整个火花的"DEBUG"日志,但令我惊讶的是... Spark没有使用我提供的[SECRETKEY],而是...添加了一个随机的????

17/03/08 10:40:04调试请求:发送请求:HEAD https://mybucket.s3 .amazonaws.com /标头:(授权:AWS ACCESSKEY: [RANDON-SECRET-KEY] ,用户代理:aws-sdk-java/1.7.4 Mac_OS_X/10.11.6 Java_HotSpot(TM)_64-Bit_Server_VM/25.65-b01/1.8.0_65,日期:2017年3月8日星期三10:40:04 GMT,内容类型:application/x-www-form-urlencoded; charset = utf-8, )

这就是为什么它仍然返回403的原因! Spark没有使用我提供给fs.s3a.secret.key的密钥,而是发明了一个随机的密钥??

为了记录,我正在使用此命令在本机(OSX)上本地运行

spark-submit --packages com.amazonaws:aws-java-sdk-pom:1.11.98,org.apache.hadoop:hadoop-aws:2.7.3 test.py

有人可以启发我吗?

解决方案

我遇到了类似的问题.使用有效AWS凭证的请求返回403禁止,但仅在某些计算机上.最终,我发现那些特定机器上的系统时间落后了10分钟.同步系统时钟解决了这个问题.

希望这会有所帮助!

I'm trying to read a s3 bucket from Spark and up until today Spark always complain that the request return 403

hadoopConf = spark_context._jsc.hadoopConfiguration()
hadoopConf.set("fs.s3a.access.key", "ACCESSKEY")
hadoopConf.set("fs.s3a.secret.key", "SECRETKEY")
hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
logs = spark_context.textFile("s3a://mybucket/logs/*)

Spark was saying .... Invalid Access key [ACCESSKEY]

However with the same ACCESSKEY and SECRETKEY this was working with aws-cli

aws s3 ls mybucket/logs/

and in python boto3 this was working

resource = boto3.resource("s3", region_name="us-east-1")
resource.Object("mybucket", "logs/text.py") \
            .put(Body=open("text.py", "rb"),ContentType="text/x-py")

so my credentials ARE invalid and the problem is definitely something with Spark..

Today I decided to turn on the "DEBUG" log for the entire spark and to my suprise... Spark is NOT using the [SECRETKEY] I have provided but instead... add a random one???

17/03/08 10:40:04 DEBUG request: Sending Request: HEAD https://mybucket.s3.amazonaws.com / Headers: (Authorization: AWS ACCESSKEY:[RANDON-SECRET-KEY], User-Agent: aws-sdk-java/1.7.4 Mac_OS_X/10.11.6 Java_HotSpot(TM)_64-Bit_Server_VM/25.65-b01/1.8.0_65, Date: Wed, 08 Mar 2017 10:40:04 GMT, Content-Type: application/x-www-form-urlencoded; charset=utf-8, )

This is why it still return 403! Spark is not using the key I provide with fs.s3a.secret.key but instead invent a random one??

For the record I'm running this locally on my machine (OSX) with this command

spark-submit --packages com.amazonaws:aws-java-sdk-pom:1.11.98,org.apache.hadoop:hadoop-aws:2.7.3 test.py

Could some one enlighten me on this?

解决方案

I ran into a similar issue. Requests that were using valid AWS credentials returned a 403 Forbidden, but only on certain machines. Eventually I found out that the system time on those particular machines were 10 minutes behind. Synchronizing the system clock solved the problem.

Hope this helps!

这篇关于Spark正在发明自己的AWS SecretKey的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆