spark.driver.extraClassPath和spark.executor.extraClassPath:在EC2上设置的火花类路径 [英] Setting spark classpaths on EC2: spark.driver.extraClassPath and spark.executor.extraClassPath

查看:1815
本文介绍了spark.driver.extraClassPath和spark.executor.extraClassPath:在EC2上设置的火花类路径的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通过为Maven依赖提供火花CLASSPATH减少应用程序JAR尺寸:

我的集群有3 EC2实例在其Hadoop和火花running.If我建立罐子Maven依赖,变得过大(约100 MB),我想避免这种因为瓶子是越来越复制上的所有节点,我每次运行作业。

My cluster is having 3 ec2 instances on which hadoop and spark is running.If I build jar with maven dependencies, it becomes too large(around 100 MB) which I want to avoid this as Jar is getting replicating on all nodes ,each time I run the job.

要避免,我有建立一个maven的包作为行家包。对于解决依赖我已经下载的每个节点上的所有的Maven依赖关系,然后只低于罐子路径上面提供:

To avoid that I have build a maven package as "maven package".For dependency resolution I have downloaded the all maven dependencies on each node and then only provided above below jar paths:

我加入类路径的每个节点上的火花defaults.conf

spark.driver.extraClassPath /home/spark/.m2/repository/com/google/$c$c/gson/gson/2.3.1/gson-2.3.1.jar:/home/spark/.m2/repository/com/datastax/cassandra/cassandra-driver-core/2.1.5/cassandra-driver-core-2.1.5.jar:/home/spark/.m2/repository/com/google/guava/guava/16.0.1/guava-16.0.1.jar:/home/spark/.m2/repository/com/google/collections/google-collections/1.0/google-collections-1.0.jar:/home/spark/.m2/repository/com/datastax/spark/spark-cassandra-connector-java_2.10/1.2.0-rc1/spark-cassandra-connector-java_2.10-1.2.0-rc1.jar:/home/spark/.m2/repository/com/datastax/spark/spark-cassandra-connector_2.10/1.2.0-rc1/spark-cassandra-connector_2.10-1.2.0-rc1.jar:/home/spark/.m2/repository/org/apache/cassandra/cassandra-thrift/2.1.3/cassandra-thrift-2.1.3.jar:/home/spark/.m2/repository/org/joda/joda-convert/1.2/joda-convert-1.2.jar

spark.driver.extraClassPath /home/spark/.m2/repository/com/google/code/gson/gson/2.3.1/gson-2.3.1.jar:/home/spark/.m2/repository/com/datastax/cassandra/cassandra-driver-core/2.1.5/cassandra-driver-core-2.1.5.jar:/home/spark/.m2/repository/com/google/guava/guava/16.0.1/guava-16.0.1.jar:/home/spark/.m2/repository/com/google/collections/google-collections/1.0/google-collections-1.0.jar:/home/spark/.m2/repository/com/datastax/spark/spark-cassandra-connector-java_2.10/1.2.0-rc1/spark-cassandra-connector-java_2.10-1.2.0-rc1.jar:/home/spark/.m2/repository/com/datastax/spark/spark-cassandra-connector_2.10/1.2.0-rc1/spark-cassandra-connector_2.10-1.2.0-rc1.jar:/home/spark/.m2/repository/org/apache/cassandra/cassandra-thrift/2.1.3/cassandra-thrift-2.1.3.jar:/home/spark/.m2/repository/org/joda/joda-convert/1.2/joda-convert-1.2.jar

它的工作,在本地单个节点上。
还是我得到这个error.Any帮助将AP preciated。

It has worked,locally on single node. Still i am getting this error.Any help will be appreciated.

推荐答案

最后,我是能够解决的问题。我一直在使用的MVN套装而不是MVN清洁编译装配:单创建的应用程序的jar ,这样才不会下载,同时创造罐子Maven的依赖关系(但需提供这些JAR /依赖性运行时间),这导致小尺寸罐(因为有依赖仅供参考)。

Finally, I was able to solve the problem. I have created application jar using "mvn package" instead of "mvn clean compile assembly:single ",so that it will not download the maven dependencies while creating jar(But need to provide these jar/dependencies run-time) which resulted in small size Jar(as there is only reference of dependencies).

然后,我在下面加两个参数: 火花defaults.conf每个节点上是

spark.driver.extraClassPath /home/spark/.m2/repository/com/datastax/cassandra/cassandra-driver-core/2.1.7/cassandra-driver-core-2.1.7.jar:/home/spark/.m2/repository/com/google$c$c/json-simple/json-simple/1.1/json-simple-1.1.jar:/home/spark/.m2/repository/com/google/$c$c/gson/gson/2.3.1/gson-2.3.1.jar:/home/spark/.m2/repository/com/google/guava/guava/16.0.1/guava-16.0.1.jar

spark.driver.extraClassPath /home/spark/.m2/repository/com/datastax/cassandra/cassandra-driver-core/2.1.7/cassandra-driver-core-2.1.7.jar:/home/spark/.m2/repository/com/googlecode/json-simple/json-simple/1.1/json-simple-1.1.jar:/home/spark/.m2/repository/com/google/code/gson/gson/2.3.1/gson-2.3.1.jar:/home/spark/.m2/repository/com/google/guava/guava/16.0.1/guava-16.0.1.jar

spark.executor.extraClassPath /home/spark/.m2/repository/com/datastax/cassandra/cassandra-driver-core/2.1.7/cassandra-driver-core-2.1.7.jar:/home/spark/.m2/repository/com/google$c$c/json-simple/json-simple/1.1/json-simple-1.1.jar:/home/spark/.m2/repository/com/google/$c$c/gson/gson/2.3.1/gson-2.3.1.jar:/home/spark/.m2/repository/com/google/guava/guava/16.0.1/guava-16.0.1.jar

spark.executor.extraClassPath /home/spark/.m2/repository/com/datastax/cassandra/cassandra-driver-core/2.1.7/cassandra-driver-core-2.1.7.jar:/home/spark/.m2/repository/com/googlecode/json-simple/json-simple/1.1/json-simple-1.1.jar:/home/spark/.m2/repository/com/google/code/gson/gson/2.3.1/gson-2.3.1.jar:/home/spark/.m2/repository/com/google/guava/guava/16.0.1/guava-16.0.1.jar

于是问题出现了,如何应用JAR将得到Maven依赖(所需的jar的)运行时?

有关,我已经下载使用 MVN清洁编译装配的每个节点上的所有需要​​的依赖:单事先

For that I have downloaded all required dependencies on each node using mvn clean compile assembly:single in advance.

这篇关于spark.driver.extraClassPath和spark.executor.extraClassPath:在EC2上设置的火花类路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆