Spark 正在发明他自己的 AWS secretKey [英] Spark is inventing his own AWS secretKey
问题描述
我正在尝试从 Spark 读取 s3 存储桶直到今天 Spark 总是抱怨请求返回 403
hadoopConf = spark_context._jsc.hadoopConfiguration()hadoopConf.set("fs.s3a.access.key", "ACCESSKEY")hadoopConf.set("fs.s3a.secret.key", "SECRETKEY")hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")logs = spark_context.textFile("s3a://mybucket/logs/*)
Spark 说......无效的访问密钥 [ACCESSKEY]
但是使用相同的 ACCESSKEY 和 SECRETKEY 这与 aws-cli 一起使用
aws s3 ls mybucket/logs/
在 python boto3 中这是有效的
resource = boto3.resource("s3", region_name="us-east-1")resource.Object("mybucket", "logs/text.py") .put(Body=open("text.py", "rb"),ContentType="text/x-py")
所以我的凭据无效,问题肯定出在 Spark 上..
今天我决定打开整个 Spark 的调试"日志,令我惊讶的是...... Spark 没有使用我提供的 [SECRETKEY] 而是......添加一个随机的???>
17/03/08 10:40:04 DEBUG 请求:发送请求:HEAD https://mybucket.s3.amazonaws.com/标题:(授权:AWS ACCESSKEY:[RANDON-SECRET-KEY],用户代理:aws-sdk-java/1.7.4 Mac_OS_X/10.11.6Java_HotSpot(TM)_64-Bit_Server_VM/25.65-b01/1.8.0_65,日期:2017 年 3 月 8 日星期三 10:40:04 GMT,内容类型:application/x-www-form-urlencoded;charset=utf-8,)
这就是为什么它仍然返回403!Spark 没有使用我随 fs.s3a.secret.key 提供的密钥,而是发明了一个随机密钥??
为了记录,我使用这个命令在我的机器 (OSX) 上本地运行它
spark-submit --packages com.amazonaws:aws-java-sdk-pom:1.11.98,org.apache.hadoop:hadoop-aws:2.7.3 test.py
有人可以启发我吗?
我遇到了类似的问题.使用有效 AWS 凭证的请求返回 403 Forbidden,但仅在某些机器上.最终我发现那些特定机器上的系统时间晚了 10 分钟.同步系统时钟解决了这个问题.
希望这有帮助!
I'm trying to read a s3 bucket from Spark and up until today Spark always complain that the request return 403
hadoopConf = spark_context._jsc.hadoopConfiguration()
hadoopConf.set("fs.s3a.access.key", "ACCESSKEY")
hadoopConf.set("fs.s3a.secret.key", "SECRETKEY")
hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
logs = spark_context.textFile("s3a://mybucket/logs/*)
Spark was saying .... Invalid Access key [ACCESSKEY]
However with the same ACCESSKEY and SECRETKEY this was working with aws-cli
aws s3 ls mybucket/logs/
and in python boto3 this was working
resource = boto3.resource("s3", region_name="us-east-1")
resource.Object("mybucket", "logs/text.py")
.put(Body=open("text.py", "rb"),ContentType="text/x-py")
so my credentials ARE invalid and the problem is definitely something with Spark..
Today I decided to turn on the "DEBUG" log for the entire spark and to my suprise... Spark is NOT using the [SECRETKEY] I have provided but instead... add a random one???
17/03/08 10:40:04 DEBUG request: Sending Request: HEAD https://mybucket.s3.amazonaws.com / Headers: (Authorization: AWS ACCESSKEY:[RANDON-SECRET-KEY], User-Agent: aws-sdk-java/1.7.4 Mac_OS_X/10.11.6 Java_HotSpot(TM)_64-Bit_Server_VM/25.65-b01/1.8.0_65, Date: Wed, 08 Mar 2017 10:40:04 GMT, Content-Type: application/x-www-form-urlencoded; charset=utf-8, )
This is why it still return 403! Spark is not using the key I provide with fs.s3a.secret.key but instead invent a random one??
For the record I'm running this locally on my machine (OSX) with this command
spark-submit --packages com.amazonaws:aws-java-sdk-pom:1.11.98,org.apache.hadoop:hadoop-aws:2.7.3 test.py
Could some one enlighten me on this?
I ran into a similar issue. Requests that were using valid AWS credentials returned a 403 Forbidden, but only on certain machines. Eventually I found out that the system time on those particular machines were 10 minutes behind. Synchronizing the system clock solved the problem.
Hope this helps!
这篇关于Spark 正在发明他自己的 AWS secretKey的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!