如何提交将罐子托管在S3对象存储中的SPARK作业 [英] How to submit a SPARK job of which the jar is hosted in S3 object store

查看:62
本文介绍了如何提交将罐子托管在S3对象存储中的SPARK作业的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有Yarn的SPARK群集,并且我想将我的工作的jar放入S3 100%兼容的对象存储中.如果我想提交工作,我从Google进行搜索,看起来就像这样:spark-submit --master yarn --deploy-mode cluster< ...其他参数...> s3://my_ bucket/jar_file但是,S3对象存储需要用户名和密码才能访问.那么,如何配置这些凭据信息以让SPARRK从S3下载jar?非常感谢!

I have a SPARK cluster with Yarn, and I want to put my job's jar into a S3 100% compatible Object Store. If I want to submit the job, I search from google and seems that just simply as this way: spark-submit --master yarn --deploy-mode cluster <...other parameters...> s3://my_ bucket/jar_file However the S3 Object Store required user name and password to access. So how I can config those credential information to let SPARRK download the jar from S3? Many thanks!

推荐答案

您可以使用我需要从Maven下载以下jar,并将其放在Spark jar目录中,以便允许在 spark-submit 中使用 s3a 模式(请注意,您可以使用-packages 指令从jar内引用这些依赖关系,但不能从 spark-submit 本身引用):

I needed to download the following jars from Maven and put it to Spark jar dir in order to allow to use s3a schema in spark-submit (note, you can use --packages directive to reference these dependencies from inside your jar, but not from spark-submit itself):

// build Spark `assembly` project
sbt "project assembly" package
cd assembly/target/scala-2.11/jars/ 
wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar 
wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.7/hadoop-aws-2.7.7.jar

这篇关于如何提交将罐子托管在S3对象存储中的SPARK作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆