如何在HDP的自定义版本中以无头模式运行spark? [英] How can I run spark in headless mode in my custom version on HDP?
问题描述
如何在无头模式下运行spark?目前,我正在集群上的HDP 2.6.4(即默认情况下安装了2.2)上执行spark.我从https://spark.apache.org/downloads.html .确切的名称是:用scala 2.11预先构建,并且用户提供了hadoop
How can I run spark in headless mode? Currently, I am executing spark on a HDP 2.6.4 (i.e. 2.2 is installed by default) on the cluster. I have downloaded a spark 2.4.1 Scala 2.11 release in headless mode (i.e. no hadoop jars are built in) from https://spark.apache.org/downloads.html. The exact name is: pre-built with scala 2.11 and user provided hadoop
现在,当我尝试运行时,请遵循: https://spark.apache.org/docs/latest/hadoop-provided.html
Now when trying to run I follow: https://spark.apache.org/docs/latest/hadoop-provided.html
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
export HADOOP_CONF_DIR=/etc/hadoop/conf
export SPARK_HOME=/home/<<my_user>>/development/software/spark_no_provided_hadoop
./bin/spark-shell --master yarn --deploy-mode client --queue <<my_yarn_queue>>
不幸的是,它无法启动:
Unfortunately, it fails to start:
19/05/01 07:12:23 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
19/05/01 07:12:38 ERROR cluster.YarnClientSchedulerBackend: The YARN application has already ended! It might have been killed or the Application Master may have failed to start. Check the YARN application logs for more details.
19/05/01 07:12:38 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Application application_1555489055691_64276 failed 2 times due to AM Container for appattempt_1555489055691_64276_000002 exited with exitCode: 1
When looking at the logs for details I see:
Log Type: prelaunch.err
launch_container.sh: line 30: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:/etc/hadoop/conf:/usr/hdp/2.6.4.0-91/hadoop/*:/usr/hdp/2.6.4.0-91/hadoop/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:/usr/hdp/2.6.4.0-91/hadoop/conf:/usr/hdp/2.6.4.0-91/hadoop/lib/*:/usr/hdp/2.6.4.0-91/hadoop/.//*:/usr/hdp/2.6.4.0-91/hadoop-hdfs/./:/usr/hdp/2.6.4.0-91/hadoop-hdfs/lib/*:/usr/hdp/2.6.4.0-91/hadoop-hdfs/.//*:/usr/hdp/2.6.4.0-91/hadoop-yarn/lib/*:/usr/hdp/2.6.4.0-91/hadoop-yarn/.//*:/usr/hdp/2.6.4.0-91/hadoop-mapreduce/lib/*:/usr/hdp/2.6.4.0-91/hadoop-mapreduce/.//*:/usr/hdp/2.6.4.0-91/tez/*:/usr/hdp/2.6.4.0-91/tez/lib/*:/usr/hdp/2.6.4.0-91/tez/conf:$PWD/__spark_conf__/__hadoop_conf__: bad substitution
所以:
/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar: bad substitution
是原因(与 https://community.hortonworks.com/questions/23699/bad-substitution-error-running-spark-on-yarn.html ),但这完全在Ambari的管理范围内.如何解决该问题,以便在现有的2.6.x HDP平台上运行更新版本的spark(2.4.x)?
is the cause (and similar to https://community.hortonworks.com/questions/23699/bad-substitution-error-running-spark-on-yarn.html), but this is completely inside Ambari's management domain. How can I work around it to run a more recent version of spark (2.4.x) on the existing 2.6.x HDP plattform?
假设我为 HADOOP_CONF_DIR
传递了错误的配置目录,则该目录未设置.但是然后:
Assuming I passed a wrong configuration directory for HADOOP_CONF_DIR
, it is unset. But then:
When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.
,因此必须通过.可能是我传递了错误的值吗?根据例外:java.lang.例外:使用主纱线"运行时,必须在环境中设置HADOOP_CONF_DIR或YARN_CONF_DIR.火花可能是正确的.对我而言,默认情况下未设置HADOOP_HOME.
so it must be passed. Could it be, that I am passing the wrong value? According to Exception: java.lang.Exception: When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment. in spark could be correct. For me, no HADOOP_HOME is set by default.
即使设置为: export HADOOP_CONF_DIR =/usr/hdp/current/spark2-client/conf
,仍然存在相同的错误替换错误.
Even when setting to: export HADOOP_CONF_DIR=/usr/hdp/current/spark2-client/conf
, the same bad substitution error remains.
注意:一些有趣的步骤:
NOTE: some interesting steps:
- https://community.hortonworks.com/articles/244059/steps-to-install-supplementary-spark-on-hdp-cluste.html ,但不适用于无头版本
- https://community.hortonworks.com/questions/85757/how-to-add-the-hadoop-and-yarn-configuration-file.html
- https://community.hortonworks.com/articles/244059/steps-to-install-supplementary-spark-on-hdp-cluste.html, but not for the headless edition
- https://community.hortonworks.com/questions/85757/how-to-add-the-hadoop-and-yarn-configuration-file.html
推荐答案
Indeed, https://community.hortonworks.com/questions/23699/bad-substitution-error-running-spark-on-yarn.html is the solution:
cd /usr/hdp
ls
2.6.xxx current share
所以对我来说
./bin/spark-shell --master yarn --deploy-mode client --queue <<my_queue>>--conf spark.driver.extraJavaOptions='-Dhdp.version=2.6.xxx' --conf spark.yarn.am.extraJavaOptions='-Dhdp.version=2.6.xxx'
有效
这篇关于如何在HDP的自定义版本中以无头模式运行spark?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!