在Docker之上的Spark不接受工作 [英] Spark atop of Docker not accepting jobs

查看:215
本文介绍了在Docker之上的Spark不接受工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用spark + docker制作一个hello world示例,这是我的代码.

I'm trying to make a hello world example work with spark+docker, and here is my code.

object Generic {
  def main(args: Array[String]) {
    val sc = new SparkContext("spark://172.17.0.3:7077", "Generic", "/opt/spark-0.9.0")

    val NUM_SAMPLES = 100000
    val count = sc.parallelize(1 to NUM_SAMPLES).map{i =>
      val x = Math.random * 2 - 1
      val y = Math.random * 2 - 1
      if (x * x + y * y < 1) 1.0 else 0.0
    }.reduce(_ + _)

    println("Pi is roughly " + 4 * count / NUM_SAMPLES)
  }
}

运行sbt run时,我得到

14/05/28 15:19:58 INFO client.AppClient$ClientActor: Connecting to master spark://172.17.0.3:7077...
14/05/28 15:20:08 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

我检查了集群UI(其中有3个节点,每个节点有1.5g的内存)和namenode UI(我在其中看到相同的东西).

I checked both the cluster UI, where I have 3 nodes that each have 1.5g of memory, and the namenode UI, where I see the same thing.

泊坞窗日志显示工人无输出,而主机日志无输出

The docker logs show no output from the workers and the following from the master

14/05/28 21:20:38 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@master:7077] -> [akka.tcp://spark@10.0.3.1:48085]: Error [Association failed with [akka.tcp://spark@10.0.3.1:48085]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://spark@10.0.3.1:48085]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: /10.0.3.1:48085

]

这会发生几次,然后程序超时并死掉

This happens a couple times, and then the program times out and dies with

[error] (run-main-0) org.apache.spark.SparkException: Job aborted: Spark cluster looks down

当我通过docker0接口执行tcpdump时,看起来工作线程和主节点正在交谈.

When I did a tcpdump over the docker0 interface, and it looks like the workers and the master nodes are talking.

但是,火花控制台可以工作.

However, the spark console works.

如果将sc设置为val sc = new SparkContext("local", "Generic", System.getenv("SPARK_HOME")),程序将运行

If I set sc as val sc = new SparkContext("local", "Generic", System.getenv("SPARK_HOME")), the program runs

推荐答案

我去过那里.该问题看起来像Spark中的AKKA actor子系统绑定的接口与docker0上的Spark绑定的接口不同.

I've been there. The issue looks like the AKKA actor subsystem in Spark is binding on a different interface than Spark on docker0.

您的主IP开启时:spark://172.17.0.3:7077

Akka绑定在:akka.tcp://spark@10.0.3.1:48085

如果您的主/从是docker容器,则它们应该通过172.17.x.x范围内的docker0接口进行通信.

If you masters/slaves are docker containers, they should be communicating through the docker0 interface in the 172.17.x.x range.

尝试使用env config SPARK_LOCAL_IP为主机和从机提供正确的本地IP.有关详细信息,请参见配置文档.

Try providing the master and slaves with their correct local IP using the env config SPARK_LOCAL_IP. See config docs for details.

在Spark 0.9的docker设置中,我们使用以下命令启动从站:

In our docker setup for Spark 0.9 we are using this command to start the slaves:

${SPARK_HOME}/bin/spark-class org.apache.spark.deploy.worker.Worker $MASTER_IP -i $LOCAL_IP 

直接向工作人员提供本地IP.

Which directly provides the local IP to the worker.

这篇关于在Docker之上的Spark不接受工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆