如何提交将罐子托管在S3对象存储中的SPARK作业 [英] How to submit a SPARK job of which the jar is hosted in S3 object store

查看：62 发布时间：2021/4/3 19:34:41 amazon-s3 spark-submit

本文介绍了如何提交将罐子托管在S3对象存储中的SPARK作业的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个带有Yarn的SPARK群集，并且我想将我的工作的jar放入S3 100％兼容的对象存储中.如果我想提交工作，我从Google进行搜索，看起来就像这样:spark-submit --master yarn --deploy-mode cluster< ...其他参数...> s3://my_ bucket/jar_file但是，S3对象存储需要用户名和密码才能访问.那么，如何配置这些凭据信息以让SPARRK从S3下载jar?非常感谢！

I have a SPARK cluster with Yarn, and I want to put my job's jar into a S3 100% compatible Object Store. If I want to submit the job, I search from google and seems that just simply as this way: spark-submit --master yarn --deploy-mode cluster <...other parameters...> s3://my_ bucket/jar_file However the S3 Object Store required user name and password to access. So how I can config those credential information to let SPARRK download the jar from S3? Many thanks!

I needed to download the following jars from Maven and put it to Spark jar dir in order to allow to use s3a schema in spark-submit (note, you can use --packages directive to reference these dependencies from inside your jar, but not from spark-submit itself):

// build Spark `assembly` project
sbt "project assembly" package
cd assembly/target/scala-2.11/jars/ 
wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar 
wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.7/hadoop-aws-2.7.7.jar

这篇关于如何提交将罐子托管在S3对象存储中的SPARK作业的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何提交将罐子托管在S3对象存储中的SPARK作业 [英] How to submit a SPARK job of which the jar is hosted in S3 object store

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何提交将罐子托管在S3对象存储中的SPARK作业 [英] How to submit a SPARK job of which the jar is hosted in S3 object store

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭