在Spark中使用方法addJar()有什么用? [英] What is use of method addJar() in Spark?
问题描述
在spark job中,我不知道如何导入和使用方法 SparkContext.addJar()
共享的jar。看来这个方法能够将jar移动到集群中其他节点可以访问的某个地方,但我不知道如何导入它们。
这是一个例子:
In spark job, I don't know how to import and use the jars that is shared by method SparkContext.addJar()
. It seems that this method is able to move jars into some place that are accessible by other nodes in the cluster, but I do not know how to import them.
This is an example:
package utils;
public class addNumber {
public int addOne(int i){
return i + 1;
}
public int addTwo(int i){
return i + 2;
}
}
我创建了一个名为addNumber的类,并使其成为一个jar文件 utils.jar
。
I create a class called addNumber and make it into a jar file utils.jar
.
然后我创建一个spark作业,代码如下所示:
Then I create a spark job and codes are shown below:
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
object TestDependencies {
def main(args:Array[String]): Unit = {
val sparkConf = new SparkConf
val sc = new SparkContext(sparkConf)
sc.addJar("/path/to//utils.jar")
val data = 1 to 100 toList
val rdd = sc.makeRDD(data)
val rdd_1 = rdd.map ( x => {
val handler = new utils.addNumber
handler.addOne(x)
} )
rdd_1.collect().foreach { x => print(x + "||") }
}
}
通过命令spark-submit
提交作业后引发错误java.lang.NoClassDefFoundError:utils / addNumber。
The error "java.lang.NoClassDefFoundError: utils/addNumber" raised after submission of the job through command "spark-submit"
.
我知道方法 addJar()
并不保证jar包含在spark作业的类路径中。如果我想使用jar文件,我将所有依赖项移动到集群的每个节点中的相同路径。但是如果我可以移动并包含所有的jar,那么使用方法 addJar()
是什么?
I know that method addJar()
does not guarantee jars included into class path of the spark job. If I want to use the jar files I have move all of dependencies to the same path in each node of cluster. But if I can move and include all of the jars, what is the use of method addJar()
?
我想知道是否有办法使用方法 addJar()
导入的jar。在此先感谢。
I am wondering if there is a way using jars imported by method addJar()
. Thanks in advance.
推荐答案
您是否尝试使用前缀local设置jar的路径?来自文档:
Did you try set the path of jar with prefix "local"? From documentation:
public void addJar(String path)
为将来在此
SparkContext上执行的所有任务添加JAR依赖项。传递的路径可以是本地
文件,HDFS(或其他Hadoop支持的文件系统)中的文件,HTTP,
HTTPS或FTP URI,或者每个工作者上的文件的本地:/路径节点。
Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. The path passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.
您也可以尝试这种方式:
You can try this way as well:
val conf = new SparkConf()
.setMaster('local[*]')
.setAppName('tmp')
.setJars(Array('/path1/one.jar', '/path2/two.jar'))
val sc = new SparkContext(conf)
并查看这里,检查spark.jars选项
and take a look here, check spark.jars option
并在spark-submit中设置--jars参数:
and set "--jars" param in spark-submit:
--jars /path/1.jar,/path/2.jar
或编辑conf / spark-defaults.conf:
or edit conf/spark-defaults.conf:
spark.driver.extraClassPath /path/1.jar:/fullpath/2.jar
spark.executor.extraClassPath /path/1.jar:/fullpath/2.jar
这篇关于在Spark中使用方法addJar()有什么用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!