Apache Spark:“无法启动org.apache.spark.deploy.worker.Worker".或硕士 [英] Apache Spark: "failed to launch org.apache.spark.deploy.worker.Worker" or Master

查看:978
本文介绍了Apache Spark:“无法启动org.apache.spark.deploy.worker.Worker".或硕士的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在运行Ubuntu8.bb的Ubuntu14.04上的Openstack上创建了一个Spark集群.我创建了两个虚拟机,每个虚拟机的大小均为3gb(对于父操作系统,则保留2gb).此外,我从第一台虚拟机创建一个master和2个worker,从第二台虚拟机创建3个worker.

I have created a Spark cluster on Openstack running on Ubuntu14.04 with 8gb of ram. I created two virtual machines with 3gb each (keeping 2 gb for the parent OS). Further, i create a master and 2 workers from first virtual machine and 3 workers from second machine.

spark-env.sh文件的基本设置为

The spark-env.sh file has basic setting with

export SPARK_MASTER_IP=10.0.0.30
export SPARK_WORKER_INSTANCES=2
export SPARK_WORKER_MEMORY=1g
export SPARK_WORKER_CORES=1

每当我使用start-all.sh部署集群时,都会无法启动org.apache.spark.deploy.worker.Worker",有时会无法启动org.apache.spark.deploy.master".掌握".当我看到日志文件以查找错误时,得到以下信息

Whenever i deploy the cluster with start-all.sh, i get "failed to launch org.apache.spark.deploy.worker.Worker" and some times "failed to launch org.apache.spark.deploy.master.Master". When i see the log file to look for error i get the following

Spark命令:/usr/lib/jvm/java-7-openjdk-amd64/bin/java -cp>/home/ubuntu/spark-1.5.1/sbin/../conf/:/home/ubuntu/spark->1.5.1/assembly/target/scala-2.10/spark-assembly-1.5.1->hadoop2.2.0.jar:/home/ubuntu/spark-1.5.1/lib_managed/jars/datanucleus-api- > jdo-3.2.6.jar:/home/ubuntu/spark-1.5.1/lib_managed/jars/datanucleus-core-> 3.2.10.jar:/home/ubuntu/spark-1.5.1/lib_managed/jars/datanucleus-rdbms-> 3.2.9.jar -Xms1g -Xmx1g -XX:MaxPermSize = 256m> org.apache.spark.deploy.master.Master --ip 10.0.0.30 --port 7077 --webui->端口8080

Spark Command: /usr/lib/jvm/java-7-openjdk-amd64/bin/java -cp >/home/ubuntu/spark-1.5.1/sbin/../conf/:/home/ubuntu/spark->1.5.1/assembly/target/scala-2.10/spark-assembly-1.5.1->hadoop2.2.0.jar:/home/ubuntu/spark-1.5.1/lib_managed/jars/datanucleus-api->jdo-3.2.6.jar:/home/ubuntu/spark-1.5.1/lib_managed/jars/datanucleus-core->3.2.10.jar:/home/ubuntu/spark-1.5.1/lib_managed/jars/datanucleus-rdbms->3.2.9.jar -Xms1g -Xmx1g -XX:MaxPermSize=256m >org.apache.spark.deploy.master.Master --ip 10.0.0.30 --port 7077 --webui->port 8080

尽管我收到了失败消息,但主服务器或工作器在几秒钟后仍然存活.

Though I get the fail message but the master or worker become alive after a few seconds.

有人可以解释原因吗?

推荐答案

Spark配置系统杂乱无章的环境变量,参数标志和Java属性文件.我花了几个小时来追踪相同的警告,并详细介绍了Spark初始化过程,这就是我所发现的:

The Spark configuration system is a mess of environment variables, argument flags, and Java Properties files. I just spent a couple hours tracking down the same warning, and unraveling the Spark initialization procedure, and here's what I found:

  1. sbin/start-all.sh调用sbin/start-master.sh(然后是sbin/start-slaves.sh)
  2. sbin/start-master.sh调用sbin/spark-daemon.sh start org.apache.spark.deploy.master.Master ...
  3. sbin/spark-daemon.sh start ...派生对bin/spark-class org.apache.spark.deploy.master.Master ...的调用,捕获生成的进程ID(pid),休眠2秒,然后检查pid的命令名称是否为"java"
  4. bin/spark-class是bash脚本,因此它以命令名称"bash"开始,然后继续:
  1. sbin/start-all.sh calls sbin/start-master.sh (and then sbin/start-slaves.sh)
  2. sbin/start-master.sh calls sbin/spark-daemon.sh start org.apache.spark.deploy.master.Master ...
  3. sbin/spark-daemon.sh start ... forks off a call to bin/spark-class org.apache.spark.deploy.master.Master ..., captures the resulting process id (pid), sleeps for 2 seconds, and then checks whether that pid's command's name is "java"
  4. bin/spark-class is a bash script, so it starts out with the command name "bash", and proceeds to:
  1. 通过采购bin/load-spark-env.sh
  2. (重新)加载Spark环境
  3. 找到java可执行文件
  4. 找到合适的Spark jar
  5. 调用java ... org.apache.spark.launcher.Main ...以获得Spark部署所需的完整类路径
  6. 然后最终将控制权通过exec移交给java ... org.apache.spark.deploy.master.Master,此时命令名称变为"java"
  1. (re-)load the Spark environment by sourcing bin/load-spark-env.sh
  2. finds the java executable
  3. finds the right Spark jar
  4. calls java ... org.apache.spark.launcher.Main ... to get the full classpath needed for a Spark deployment
  5. then finally hands over control, via exec, to java ... org.apache.spark.deploy.master.Master, at which point the command name becomes "java"

如果步骤4.1到4.5花费的时间超过2秒,这在我(和您)的经验中似乎是不可避免的,而在以前从未运行过java的新操作系统上,您将获得启动失败"的信息消息,尽管实际上没有任何失败.

If steps 4.1 through 4.5 take longer than 2 seconds, which in my (and your) experience seems pretty much inevitable on a fresh OS where java has never been previously run, you'll get the "failed to launch" message, despite nothing actually having failed.

从属会因为相同的原因而抱怨,并反复跳动直到主机真正可用为止,但是他们应该继续重试,直到他们成功连接到主机为止.

The slaves will complain for the same reason, and thrash around until the master is actually available, but they should keep retrying until they successfully connect to the master.

我已经在EC2上运行了一个非常标准的Spark部署;我使用:

I've got a pretty standard Spark deployment running on EC2; I use:

  • conf/spark-defaults.conf设置spark.executor.memory并通过spark.{driver,executor}.extraClassPath
  • 添加一些自定义jar
  • conf/spark-env.sh设置SPARK_WORKER_CORES=$(($(nproc) * 2))
  • conf/slaves列出我的奴隶
  • conf/spark-defaults.conf to set spark.executor.memory and add some custom jars via spark.{driver,executor}.extraClassPath
  • conf/spark-env.sh to set SPARK_WORKER_CORES=$(($(nproc) * 2))
  • conf/slaves to list my slaves

这是我绕过某些{bin,sbin}/*.sh雷区/迷宫开始Spark部署的方式:

Here's how I start a Spark deployment, bypassing some of the {bin,sbin}/*.sh minefield/maze:

# on master, with SPARK_HOME and conf/slaves set appropriately
mapfile -t ARGS < <(java -cp $SPARK_HOME/lib/spark-assembly-1.6.1-hadoop2.6.0.jar org.apache.spark.launcher.Main org.apache.spark.deploy.master.Master | tr '\0' '\n')
# $ARGS now contains the full call to start the master, which I daemonize with nohup
SPARK_PUBLIC_DNS=0.0.0.0 nohup "${ARGS[@]}" >> $SPARK_HOME/master.log 2>&1 < /dev/null &

我仍在使用sbin/start-daemon.sh来启动从站,因为这比在ssh命令中调用nohup更容易:

I'm still using sbin/start-daemon.sh to start the slaves, since that's easier than calling nohup within the ssh command:

MASTER=spark://$(hostname -i):7077
while read -r; do
  ssh -o StrictHostKeyChecking=no $REPLY "$SPARK_HOME/sbin/spark-daemon.sh start org.apache.spark.deploy.worker.Worker 1 $MASTER" &
done <$SPARK_HOME/conf/slaves
# this forks the ssh calls, so wait for them to exit before you logout

那里!它假设我正在使用所有默认端口和东西,并且我没有像在文件名中添加空格那样愚蠢,但我认为这样更干净.

There! It assumes that I'm using all the default ports and stuff, and that I'm not doing stupid shit like putting whitespace in filenames, but I think it's cleaner this way.

这篇关于Apache Spark:“无法启动org.apache.spark.deploy.worker.Worker".或硕士的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆