spark-submit:--jars不起作用 [英] spark-submit: --jars does not work

查看:95
本文介绍了spark-submit:--jars不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为Spark Streaming作业构建指标系统,在系统中,指标是在每个执行者中收集的,因此需要在每个执行者中初始化指标源(用于收集指标的类).

I am building metrics system for Spark Streaming job, in the system, the metrics are collected in each executor, so a metrics source (a class used to collect metrics) needs to be initialized in each executor.

度量标准源打包在jar中,在提交作业时,使用参数'--jars'将jar从本地发送到每个执行者,但是,执行者在jar到达之前开始初始化度量标准源类,结果抛出类未找到异常.

The metrics source is packaged in a jar, when submitting a job, the jar is sent from local to each executor using the parameter '--jars', however, the executor starts to initialize the metrics source class before the jar arrives, as a result, it throws class not found exception.

看来,如果执行者可以等到所有资源准备就绪,问题就会解决,但是我真的不知道该怎么办.

It seems that if the executor could wait until all resources are ready, the issue will be resolved, but I really do not know how to do it.

有人遇到同样的问题吗?

Is there anyone facing the same issue?

PS:我尝试使用HDFS(将jar复制到HDFS,然后提交作业,并让执行程序从HDFS中的路径加载类),但是失败.我检查了源代码,看来类加载器只能解析本地路径.

PS: I tried using HDFS (copy the jar to HDFS, then submit the job and let the executor load class from a path in HDFS), but it fails. I checked the source code, it seems that the class loader can only resolve local path.

这是日志,您可以看到jar是在2016-01-15 18:08:07添加到classpath的,但初始化始于2016-01-15 18:07:26

Here is the log, you can see that the jar is added to classpath at 2016-01-15 18:08:07, but the initialization starts at 2016-01-15 18:07:26

INFO 2016-01-15 18:08:07 org.apache.spark.executor.Executor:正在添加文件:/var/lib/spark/worker/worker-0/app-20160115180722-0041/0/./datainsights-metrics-source-assembly-1.0.jar到类加载器

INFO 2016-01-15 18:08:07 org.apache.spark.executor.Executor: Adding file:/var/lib/spark/worker/worker-0/app-20160115180722-0041/0/./datainsights-metrics-source-assembly-1.0.jar to class loader

错误2016-01-15 18:07:26 Logging.scala:96-org.apache.spark.metrics.MetricsSystem:源类org.apache.spark.metrics.PerfCounterSource无法实例化

ERROR 2016-01-15 18:07:26 Logging.scala:96 - org.apache.spark.metrics.MetricsSystem: Source class org.apache.spark.metrics.PerfCounterSource cannot be instantiated

这是我使用的命令:

spark-submit --verbose \
 --jars /tmp/datainsights-metrics-source-assembly-1.0.jar \ 
 --conf "spark.metrics.conf=metrics.properties" \
 --class org.microsoft.ofe.datainsights.StartServiceSignalPipeline \
 ./target/datainsights-1.0-jar-with-dependencies.jar

推荐答案

我可以想到几个选项:-

I could think of couple of Options: -

  1. 创建一个包含主要类和依赖项的Fat Jar文件.
  2. 如果依赖项仅由执行者而非驱动程序使用,则可以使用 SparkConf.setJars(....)显式添加jar文件,或者如果驱动程序也使用了依赖关系,那么您还可以使用命令行选项-driver-class-path 来配置Driver classpath.
  3. 尝试使用以下参数在Spark-default.conf中对其进行配置:-

  1. Create a Fat Jar File, which includes main classes and dependencies.
  2. If dependencies are used only by executors and not by the Driver, then you can explicitly add jar files using SparkConf.setJars(....) or in case it is used by driver too, then you can also use command line option --driver-class-path for configuring Driver classpath.
  3. Try configure it in Spark-default.conf using following parameters: -

spark.executor.extraClassPath=<classapth>
spark.executor.extraClassPath=<classapth>

无论您做什么,我都建议修复网络延迟,否则会损害Spark作业的性能.

No matter what you do, I would suggest to fix the network latency, otherwise it will hurt the performance of Spark jobs.

这篇关于spark-submit:--jars不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆