如何在集群模式下使用Spark提交配置:jars,packages :? [英] how to use Spark-submit configuration: jars,packages:in cluster mode?

查看:626
本文介绍了如何在集群模式下使用Spark提交配置:jars,packages :?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在集群模式(yarn-cluster)中使用Spark-submit时,jar和包的配置使我感到困惑:对于jar,我可以将它们放在HDFS中,而不是放在本地目录中.但是对于软件包来说,因为它们是使用Maven和HDFS构建的,所以它不起作用.我的方式如下:

When use Spark-submit in cluster mode(yarn-cluster),jars and packages configuration confused me: for jars, i can put them in HDFS, instead of in local directory . But for packages, because they build with Maven, with HDFS,it can't work. my way like below:

spark-submit --jars hdfs:///mysql-connector-java-5.1.39-bin.jar --driver-class-path /home/liac/test/mysql-connector-java-5.1.39/mysql-connector-java-5.1.39-bin.jar --conf "spark.mongodb.input.uri=mongodb://192.168.27.234/test.myCollection2?readPreference=primaryPreferred" --conf "spark.mongodb.output.uri=mongodb://192.168.27.234/test.myCollection2"  --packages com.mongodb.spark:hdfs:///user/liac/package/jars/mongo-spark-connector_2.11-1.0.0-assembly.jar:1.0.0 --py-files /home/liac/code/diagnose_disease/tool.zip main_disease_tag_spark.py --master yarn-client

发生错误:

`Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Provided Maven Coordinates must be in the form 'groupId:artifactId:version'. The coordinate provided is: com.mongodb.spark:hdfs:///user/liac/package/jars/mongo-spark-connector_2.11-1.0.0-assembly.jar:1.0.0

任何人都可以告诉我如何在集群模式下使用jar和包吗?我的方式怎么了?

Anyone can tell me how to use jars and packages in cluster mode? and what's wrong with my way?

推荐答案

您对--packages参数的使用是错误的:

Your use of the --packages argument is wrong:

--packages com.mongodb.spark:hdfs:///user/liac/package/jars/mongo-spark-connector_2.11-1.0.0-assembly.jar:1.0.0

如输出所示,它必须采用groupId:artifactId:version的形式.您不能将其与URL一起使用.

It needs to be in the form of groupId:artifactId:version as the output suggests. You cannot use a URL with it.

将mongoDB与spark配合使用的示例,其中内置了存储库支持:

An example for using mongoDB with spark with the built-in repository support:

$SPARK_HOME/bin/spark-shell --packages org.mongodb.spark:mongo-spark-connector_2.11:1.0.0

如果您坚持使用自己的jar,则可以通过--repositories提供它.参数的值是

If you insist on using your own jar you can provide it via --repositories. The value of the argument is

用逗号分隔的远程存储库列表,以搜索软件包中指定的Maven坐标.

Comma-separated list of remote repositories to search for the Maven coordinates specified in packages.

例如,在您的情况下,可能是

For example, in your case, it could be

--repositories hdfs:///user/liac/package/jars/ --packages org.mongodb.spark:mongo-spark-connector_2.11:1.0.0

这篇关于如何在集群模式下使用Spark提交配置:jars,packages :?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆