在Spark中使用方法addJar()有什么用? [英] What is use of method addJar() in Spark?

查看:1214
本文介绍了在Spark中使用方法addJar()有什么用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在spark job中,我不知道如何导入和使用方法 SparkContext.addJar()共享的jar。看来这个方法能够将jar移动到集群中其他节点可以访问的某个地方,但我不知道如何导入它们。

这是一个例子:

In spark job, I don't know how to import and use the jars that is shared by method SparkContext.addJar(). It seems that this method is able to move jars into some place that are accessible by other nodes in the cluster, but I do not know how to import them.
This is an example:

package utils;

public class addNumber {
    public int addOne(int i){
        return i + 1;
    }
    public int addTwo(int i){
        return i + 2;
    }
}

我创建了一个名为addNumber的类,并使其成为一个jar文件 utils.jar

I create a class called addNumber and make it into a jar file utils.jar.

然后我创建一个spark作业,代码如下所示:

Then I create a spark job and codes are shown below:

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object TestDependencies {
  def main(args:Array[String]): Unit = {
    val sparkConf = new SparkConf
    val sc = new SparkContext(sparkConf)
    sc.addJar("/path/to//utils.jar")

    val data = 1 to 100 toList
    val rdd = sc.makeRDD(data)

    val rdd_1 = rdd.map ( x => {
      val handler = new utils.addNumber
      handler.addOne(x)
    } )

    rdd_1.collect().foreach { x => print(x + "||") }
  }
}

通过命令spark-submit提交作业后引发错误java.lang.NoClassDefFoundError:utils / addNumber。

The error "java.lang.NoClassDefFoundError: utils/addNumber" raised after submission of the job through command "spark-submit".

我知道方法 addJar()并不保证jar包含在spark作业的类路径中。如果我想使用jar文件,我将所有依赖项移动到集群的每个节点中的相同路径。但是如果我可以移动并包含所有的jar,那么使用方法 addJar()是什么?

I know that method addJar() does not guarantee jars included into class path of the spark job. If I want to use the jar files I have move all of dependencies to the same path in each node of cluster. But if I can move and include all of the jars, what is the use of method addJar()?

我想知道是否有办法使用方法 addJar()导入的jar。在此先感谢。

I am wondering if there is a way using jars imported by method addJar(). Thanks in advance.

推荐答案

您是否尝试使用前缀local设置jar的路径?来自文档:

Did you try set the path of jar with prefix "local"? From documentation:

public void addJar(String path)




为将来在此
SparkContext上执行的所有任务添加JAR依赖项。传递的路径可以是本地
文件,HDFS(或其他Hadoop支持的文件系统)中的文件,HTTP,
HTTPS或FTP URI,或者每个工作者上的文件的本地:/路径节点。

Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. The path passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.

您也可以尝试这种方式:

You can try this way as well:

val conf = new SparkConf()
             .setMaster('local[*]')
             .setAppName('tmp')
             .setJars(Array('/path1/one.jar', '/path2/two.jar'))

val sc = new SparkContext(conf)

并查看这里,检查spark.jars选项

and take a look here, check spark.jars option

并在spark-submit中设置--jars参数:

and set "--jars" param in spark-submit:

--jars /path/1.jar,/path/2.jar

或编辑conf / spark-defaults.conf:

or edit conf/spark-defaults.conf:

spark.driver.extraClassPath /path/1.jar:/fullpath/2.jar
spark.executor.extraClassPath /path/1.jar:/fullpath/2.jar

这篇关于在Spark中使用方法addJar()有什么用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆