SparkContext.addJar在本地模式下不起作用 [英] SparkContext.addJar does not work in local mode

查看:301
本文介绍了SparkContext.addJar在本地模式下不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当spark作业中需要jar文件时,需要通过以下两种方法将其添加到spark作业中:
1.命令中的--jar path选项.
2. SparkContext.addJar("path").
谁能告诉我这两种方式之间的区别?
来自这个问题,答案是他们是相同的,只是优先级是不同的,但我认为这是不正确的.如果我以纱线簇模式提交火花作业,则根据

When there is a jar file needed in a spark job, it needs to be added into spark job through 2 ways:
1. --jar path option in command.
2. SparkContext.addJar("path").
Can anyone tell me the difference between the these 2 ways?
From this question, the answer is they are identical and only priority is different, but I don't think it is true. If I submit the spark job in yarn cluster mode, the addJar() will not work if jar files are not included in option --jars in command according to official site.

--jars选项允许SparkContext.addJar函数在以下情况下工作: 您正在将其与本地文件一起使用并以yarn-cluster模式运行.它 如果您将它与HDFS,HTTP,HTTPS一起使用,则无需使用 或FTP文件.

The --jars option allows the SparkContext.addJar function to work if you are using it with local files and running in yarn-cluster mode. It does not need to be used if you are using it with HDFS, HTTP, HTTPS, or FTP files.

原因是驱动程序在与客户端不同的机器上运行.因此,命令中的选项--jars似乎来自客户端,而功能addJar()只能在驱动程序中的jar上运行.

The reason is that the driver runs on different machine than the client. So it seems that option --jars in command is from client and function addJar() can only work on jars in the driver.

然后我在本地模式下进行了测试.

Then I did a test in local mode.

1. spark-shell --master local --jars path/to/jar

如果以这种方式启动spark-shell,则可以在spark-shell中使用jar中的对象

If I start spark-shell in this way, object in the jar can be used in the spark-shell

2. spark-shell --master local

如果我以这种方式启动spark-shell并使用sc.addJar("path/to/jar"),则jar文件中的对象无法导入到spark-shell中,并且出现class cannot be found错误.

If I start spark-shell in this way and use sc.addJar("path/to/jar"), objects within the jar file cannot be imported into the spark-shell and I got class cannot be found error.

我的问题是:

为什么方法SparkContext.addJar()在本地模式下不起作用?

Why the method SparkContext.addJar() does not work in local mode?

SparkContext.addJar()--jars有什么区别?

我的环境:hortonworks 2.5集群和spark的版本是1.6.2.我很高兴有人能对此有所启发.

My environment: hortonworks 2.5 cluster and version of spark is 1.6.2. I appreciate if anyone can shed some light on that.

推荐答案

好吧,经过一番研究,我找到了原因.如果还有其他人参与此问题,请在此处发布.

Well, after some research, I found the reason. Just post here if there is someone else involved into this problem.

方法addJar()不会将jar添加到驱动程序的类路径中.该方法的作用是在驱动程序节点中找到jar,然后将其分发到工作程序节点中,然后将其添加到执行者的类路径中.
因为我在本地模式下提交了我的spark作业,所以在spark作业中使用了驱动程序类路径(我想),所以找不到通过方法addJar()添加的jar.

Method addJar() does not add jars into driver's classpath. What the method does is to find jars in driver node, distribute into worker nodes and then add into executors' classpath.
Because I submit my spark job in local mode, driver classpath (I guess) is used in the spark job, the jars added by method addJar() cannot be found.

为了解决此问题,请在提交Spark作业时使用--jars选项包括所有jar,或使用--driver-class-path添加jar.
可以在
此处找到.

In order to solve this problem, use --jars option to include all jars when submit the spark job or use --driver-class-path to add jars.
More details can be found here.

这篇关于SparkContext.addJar在本地模式下不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆