如何在Docker上运行Spark? [英] How to run Spark on Docker?

查看:513
本文介绍了如何在Docker上运行Spark?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

无法在Docker上运行Apache Spark.

Can’t run Apache Spark on Docker.

当我尝试从驱动程序与Spark Master通信时,我收到下一个错误:

When I try to communicate from my driver to spark master I receive next error:

15/04/03 13:08:28 WARN TaskSchedulerImpl:未接受初始作业 任何资源;检查您的集群用户界面,以确保工作人员 注册并有足够的资源

15/04/03 13:08:28 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

推荐答案

此错误听起来像工作人员尚未向主服务器注册.

This error sounds like the workers have not registered with the master.

可以在船长的火花网凳上检查http://<masterip>:8080

This can be checked at the master's spark web stool http://<masterip>:8080

您还可以简单地使用其他docker映像,或者将docker映像与有效的docker映像进行比较,看看有什么不同.

You could also simply use a different docker image, or compare docker images with one that works and see what is different.

我已经将火花大师火花工人.

如果您有一台位于NAT路由器后面的Linux机器(例如家庭防火墙),将192.168.1.*专用网络中的地址分配给这些机器,则此脚本将下载spark 1.3.1主服务器和一个工作服务器,以在地址分别为192.168.1.10和.11的单独docker容器中运行.如果您的局域网上已经使用了192.168.1.10和192.168.1.11,则可能需要调整地址.

If you have a Linux machine sitting behind a NAT router, like a home firewall, that allocates addresses in the private 192.168.1.* network to the machines, this script will download a spark 1.3.1 master and a worker to run in separate docker containers with addresses 192.168.1.10 and .11 respectively. You may need to tweak the addresses if 192.168.1.10 and 192.168.1.11 are already used on your LAN.

pipework 是用于将LAN桥接到容器而不是使用内部docker bridge的实用程序.

pipework is a utility for bridging the LAN to the container instead of using the internal docker bridge.

Spark要求所有机器都能够相互通信.据我所知,spark不是分层的,我已经看到工人试图相互开放端口.因此,在外壳程序脚本中,我公开了所有端口,如果对计算机进行了防火墙保护(例如在家用NAT路由器后面),则可以打开所有端口.

Spark requires all of the machines to be able to communicate with each other. As far as I can tell, spark is not hierarchical, I've seen the workers try to open ports to each other. So in the shell script I expose all the ports, which is OK if the machines are otherwise firewalled, such as behind a home NAT router.

./run-docker-spark

./run-docker-spark

#!/bin/bash
sudo -v
MASTER=$(docker run --name="master" -h master --add-host master:192.168.1.10 --add-host spark1:192.168.1.11 --add-host spark2:192.168.1.12 --add-host spark3:192.168.1.13 --add-host spark4:192.168.1.14 --expose=1-65535 --env SPARK_MASTER_IP=192.168.1.10 -d drpaulbrewer/spark-master:latest)
sudo pipework eth0 $MASTER 192.168.1.10/24@192.168.1.1
SPARK1=$(docker run --name="spark1" -h spark1 --add-host home:192.168.1.8 --add-host master:192.168.1.10 --add-host spark1:192.168.1.11 --add-host spark2:192.168.1.12 --add-host spark3:192.168.1.13 --add-host spark4:192.168.1.14 --expose=1-65535 --env mem=10G --env master=spark://192.168.1.10:7077 -v /data:/data -v /tmp:/tmp -d drpaulbrewer/spark-worker:latest)
sudo pipework eth0 $SPARK1 192.168.1.11/24@192.168.1.1

运行此脚本后,我可以在192.168.1.10:8080上看到主Web报告,或转到LAN上具有火花分布的另一台计算机上,然后运行./spark-shell --master spark://192.168.1.10:7077,它将弹出一个交互式scala shell.

After running this script I can see the master web report at 192.168.1.10:8080, or go to another machine on my LAN that has a spark distribution, and run ./spark-shell --master spark://192.168.1.10:7077 and it will bring up an interactive scala shell.

这篇关于如何在Docker上运行Spark?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆