如何提交将罐子托管在S3对象存储中的SPARK作业 [英] How to submit a SPARK job of which the jar is hosted in S3 object store
问题描述
我有一个带有Yarn的SPARK群集,并且我想将我的工作的jar放入S3 100%兼容的对象存储中.如果我想提交工作,我从Google进行搜索,看起来就像这样:spark-submit --master yarn --deploy-mode cluster< ...其他参数...> s3://my_ bucket/jar_file但是,S3对象存储需要用户名和密码才能访问.那么,如何配置这些凭据信息以让SPARRK从S3下载jar?非常感谢!
I have a SPARK cluster with Yarn, and I want to put my job's jar into a S3 100% compatible Object Store. If I want to submit the job, I search from google and seems that just simply as this way: spark-submit --master yarn --deploy-mode cluster <...other parameters...> s3://my_ bucket/jar_file However the S3 Object Store required user name and password to access. So how I can config those credential information to let SPARRK download the jar from S3? Many thanks!
推荐答案
I needed to download the following jars from Maven and put it to Spark jar dir in order to allow to use s3a
schema in spark-submit
(note, you can use --packages
directive to reference these dependencies from inside your jar, but not from spark-submit
itself):
// build Spark `assembly` project
sbt "project assembly" package
cd assembly/target/scala-2.11/jars/
wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar
wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.7/hadoop-aws-2.7.7.jar
这篇关于如何提交将罐子托管在S3对象存储中的SPARK作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!