Apache Spark:“无法启动org.apache.spark.deploy.worker.Worker".或硕士 [英] Apache Spark: "failed to launch org.apache.spark.deploy.worker.Worker" or Master

查看：978 发布时间：2020/9/4 5:38:13 ubuntu apache-spark cluster-computing

本文介绍了Apache Spark:“无法启动org.apache.spark.deploy.worker.Worker".或硕士的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经在运行Ubuntu8.bb的Ubuntu14.04上的Openstack上创建了一个Spark集群.我创建了两个虚拟机，每个虚拟机的大小均为3gb(对于父操作系统，则保留2gb).此外，我从第一台虚拟机创建一个master和2个worker，从第二台虚拟机创建3个worker.

I have created a Spark cluster on Openstack running on Ubuntu14.04 with 8gb of ram. I created two virtual machines with 3gb each (keeping 2 gb for the parent OS). Further, i create a master and 2 workers from first virtual machine and 3 workers from second machine.

spark-env.sh文件的基本设置为

The spark-env.sh file has basic setting with

export SPARK_MASTER_IP=10.0.0.30
export SPARK_WORKER_INSTANCES=2
export SPARK_WORKER_MEMORY=1g
export SPARK_WORKER_CORES=1

每当我使用start-all.sh部署集群时，都会无法启动org.apache.spark.deploy.worker.Worker"，有时会无法启动org.apache.spark.deploy.master".掌握".当我看到日志文件以查找错误时，得到以下信息

Whenever i deploy the cluster with start-all.sh, i get "failed to launch org.apache.spark.deploy.worker.Worker" and some times "failed to launch org.apache.spark.deploy.master.Master". When i see the log file to look for error i get the following

Spark命令:/usr/lib/jvm/java-7-openjdk-amd64/bin/java -cp>/home/ubuntu/spark-1.5.1/sbin/../conf/:/home/ubuntu/spark->1.5.1/assembly/target/scala-2.10/spark-assembly-1.5.1->hadoop2.2.0.jar:/home/ubuntu/spark-1.5.1/lib_managed/jars/datanucleus-api- > jdo-3.2.6.jar:/home/ubuntu/spark-1.5.1/lib_managed/jars/datanucleus-core-> 3.2.10.jar:/home/ubuntu/spark-1.5.1/lib_managed/jars/datanucleus-rdbms-> 3.2.9.jar -Xms1g -Xmx1g -XX:MaxPermSize = 256m> org.apache.spark.deploy.master.Master --ip 10.0.0.30 --port 7077 --webui->端口8080

Spark Command: /usr/lib/jvm/java-7-openjdk-amd64/bin/java -cp >/home/ubuntu/spark-1.5.1/sbin/../conf/:/home/ubuntu/spark->1.5.1/assembly/target/scala-2.10/spark-assembly-1.5.1->hadoop2.2.0.jar:/home/ubuntu/spark-1.5.1/lib_managed/jars/datanucleus-api->jdo-3.2.6.jar:/home/ubuntu/spark-1.5.1/lib_managed/jars/datanucleus-core->3.2.10.jar:/home/ubuntu/spark-1.5.1/lib_managed/jars/datanucleus-rdbms->3.2.9.jar -Xms1g -Xmx1g -XX:MaxPermSize=256m >org.apache.spark.deploy.master.Master --ip 10.0.0.30 --port 7077 --webui->port 8080

尽管我收到了失败消息，但主服务器或工作器在几秒钟后仍然存活.

Though I get the fail message but the master or worker become alive after a few seconds.

有人可以解释原因吗?

推荐答案

Spark配置系统杂乱无章的环境变量，参数标志和Java属性文件.我花了几个小时来追踪相同的警告，并详细介绍了Spark初始化过程，这就是我所发现的:

The Spark configuration system is a mess of environment variables, argument flags, and Java Properties files. I just spent a couple hours tracking down the same warning, and unraveling the Spark initialization procedure, and here's what I found:

sbin/start-all.sh调用sbin/start-master.sh(然后是sbin/start-slaves.sh)
sbin/start-master.sh调用sbin/spark-daemon.sh start org.apache.spark.deploy.master.Master ...
sbin/spark-daemon.sh start ...派生对bin/spark-class org.apache.spark.deploy.master.Master ...的调用，捕获生成的进程ID(pid)，休眠2秒，然后检查pid的命令名称是否为"java"
bin/spark-class是bash脚本，因此它以命令名称"bash"开始，然后继续:

sbin/start-all.sh calls sbin/start-master.sh (and then sbin/start-slaves.sh)
sbin/start-master.sh calls sbin/spark-daemon.sh start org.apache.spark.deploy.master.Master ...
sbin/spark-daemon.sh start ... forks off a call to bin/spark-class org.apache.spark.deploy.master.Master ..., captures the resulting process id (pid), sleeps for 2 seconds, and then checks whether that pid's command's name is "java"
bin/spark-class is a bash script, so it starts out with the command name "bash", and proceeds to:

通过采购bin/load-spark-env.sh
找到java可执行文件
找到合适的Spark jar
调用java ... org.apache.spark.launcher.Main ...以获得Spark部署所需的完整类路径
然后最终将控制权通过exec移交给java ... org.apache.spark.deploy.master.Master，此时命令名称变为"java"

(re-)load the Spark environment by sourcing bin/load-spark-env.sh
finds the java executable
finds the right Spark jar
calls java ... org.apache.spark.launcher.Main ... to get the full classpath needed for a Spark deployment
then finally hands over control, via exec, to java ... org.apache.spark.deploy.master.Master, at which point the command name becomes "java"

如果步骤4.1到4.5花费的时间超过2秒，这在我(和您)的经验中似乎是不可避免的，而在以前从未运行过java的新操作系统上，您将获得启动失败"的信息消息，尽管实际上没有任何失败.

If steps 4.1 through 4.5 take longer than 2 seconds, which in my (and your) experience seems pretty much inevitable on a fresh OS where java has never been previously run, you'll get the "failed to launch" message, despite nothing actually having failed.

从属会因为相同的原因而抱怨，并反复跳动直到主机真正可用为止，但是他们应该继续重试，直到他们成功连接到主机为止.

The slaves will complain for the same reason, and thrash around until the master is actually available, but they should keep retrying until they successfully connect to the master.

我已经在EC2上运行了一个非常标准的Spark部署；我使用:

I've got a pretty standard Spark deployment running on EC2; I use:

conf/spark-defaults.conf设置spark.executor.memory并通过spark.{driver,executor}.extraClassPath
conf/spark-env.sh设置SPARK_WORKER_CORES=$(($(nproc) * 2))
conf/slaves列出我的奴隶

conf/spark-defaults.conf to set spark.executor.memory and add some custom jars via spark.{driver,executor}.extraClassPath
conf/spark-env.sh to set SPARK_WORKER_CORES=$(($(nproc) * 2))
conf/slaves to list my slaves

这是我绕过某些{bin,sbin}/*.sh雷区/迷宫开始Spark部署的方式:

Here's how I start a Spark deployment, bypassing some of the {bin,sbin}/*.sh minefield/maze:

# on master, with SPARK_HOME and conf/slaves set appropriately
mapfile -t ARGS < <(java -cp $SPARK_HOME/lib/spark-assembly-1.6.1-hadoop2.6.0.jar org.apache.spark.launcher.Main org.apache.spark.deploy.master.Master | tr '\0' '\n')
# $ARGS now contains the full call to start the master, which I daemonize with nohup
SPARK_PUBLIC_DNS=0.0.0.0 nohup "${ARGS[@]}" >> $SPARK_HOME/master.log 2>&1 < /dev/null &

我仍在使用sbin/start-daemon.sh来启动从站，因为这比在ssh命令中调用nohup更容易:

I'm still using sbin/start-daemon.sh to start the slaves, since that's easier than calling nohup within the ssh command:

MASTER=spark://$(hostname -i):7077
while read -r; do
  ssh -o StrictHostKeyChecking=no $REPLY "$SPARK_HOME/sbin/spark-daemon.sh start org.apache.spark.deploy.worker.Worker 1 $MASTER" &
done <$SPARK_HOME/conf/slaves
# this forks the ssh calls, so wait for them to exit before you logout

那里！它假设我正在使用所有默认端口和东西，并且我没有像在文件名中添加空格那样愚蠢，但我认为这样更干净.

There! It assumes that I'm using all the default ports and stuff, and that I'm not doing stupid shit like putting whitespace in filenames, but I think it's cleaner this way.

这篇关于Apache Spark:“无法启动org.apache.spark.deploy.worker.Worker".或硕士的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Apache Spark:“无法启动org.apache.spark.deploy.worker.Worker".或硕士 [英] Apache Spark: "failed to launch org.apache.spark.deploy.worker.Worker" or Master

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Apache Spark:“无法启动org.apache.spark.deploy.worker.Worker".或硕士 [英] Apache Spark: &quot;failed to launch org.apache.spark.deploy.worker.Worker&quot; or Master

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

Apache Spark:“无法启动org.apache.spark.deploy.worker.Worker".或硕士 [英] Apache Spark: "failed to launch org.apache.spark.deploy.worker.Worker" or Master

登录关闭