传输端点未连接 - Mesos 从/主 [英] Transport Endpoint Not Connected - Mesos Slave / Master

查看:29
本文介绍了传输端点未连接 - Mesos 从/主的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将 Mesos 从站连接到其主站.每当从站尝试连接到主站时,我都会收到以下消息:

I'm trying to connect a Mesos slave to its master. Whenver the slave tries to connect to the master, I get the following message:

I0806 16:39:59.090845   935 hierarchical.hpp:528] Added slave 20150806-163941-1027506442-5050-921-S3 (debian) with cpus(*):1; mem(*):1938; disk(*):3777; ports(*):[31000-32000] (allocated: )
E0806 16:39:59.091384   940 socket.hpp:107] Shutdown failed on fd=25: Transport endpoint is not connected [107]
I0806 16:39:59.091508   940 master.cpp:3395] Registered slave 20150806-163941-1027506442-5050-921-S3 at slave(1)@127.0.1.1:5051 (debian) with cpus(*):1; mem(*):1938; disk(*):3777; ports(*):[31000-32000]
I0806 16:39:59.091747   940 master.cpp:1006] Slave 20150806-163941-1027506442-5050-921-S3 at slave(1)@127.0.1.1:5051 (debian) disconnected
I0806 16:39:59.091868   940 master.cpp:2203] Disconnecting slave 20150806-163941-1027506442-5050-921-S3 at slave(1)@127.0.1.1:5051 (debian)
I0806 16:39:59.092031   940 master.cpp:2222] Deactivating slave 20150806-163941-1027506442-5050-921-S3 at slave(1)@127.0.1.1:5051 (debian)
I0806 16:39:59.092248   939 hierarchical.hpp:621] Slave 20150806-163941-1027506442-5050-921-S3 deactivated

错误似乎是:

E0806 16:39:59.091384 940 socket.hpp:107] fd=25 关闭失败:传输端点未连接 [107]

主机开始使用:

./mesos-master.sh --ip=10.129.62.61 --work_dir=~/Mesos/mesos-0.23.0/workdir/ --zk=zk://10.129.62.61:2181/mesos --quorum=1

还有奴隶

./mesos-slave.sh --master=zk://10.129.62.61:2181/mesos

如果我在与主机相同的 VM 上运行从属服务器,它就可以正常工作.

If I run the slave on the same VM as the host it's working fine.

我在互联网上找不到太多信息.我在 VirtualBox 5 上运行了两个虚拟机(Debian 8.1).主机是 windows 7.

I couldn't find much information on the internet. I'm running two virtual boxes (Debian 8.1) on VirtualBox 5. The host is a windows 7.

编辑 1:

主从都运行在一个专用的虚拟机上.

The master and the slave both run on a dedicated VM.

两个虚拟机 nextorks 均使用桥接网络进行配置.

Both VMs nextorks are configured using bridged network.

来自主服务器的ifconfig:

ifconfig from master:

eth0      Link encap:Ethernet  HWaddr 08:00:27:cc:6c:6e
          inet addr:10.129.62.61  Bcast:10.129.255.255  Mask:255.255.0.0
          inet6 addr: fe80::a00:27ff:fecc:6c6e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:5335953 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1422428 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:595886271 (568.2 MiB)  TX bytes:362423868 (345.6 MiB)

来自从站的ifconfig:

ifconfig from slave:

eth0      Link encap:Ethernet  HWaddr 08:00:27:56:83:20
          inet addr:10.129.62.49  Bcast:10.129.255.255  Mask:255.255.0.0
          inet6 addr: fe80::a00:27ff:fe56:8320/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:4358561 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3825 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:397126834 (378.7 MiB)  TX bytes:354116 (345.8 KiB)

编辑 2:

slave 日志可以在 http://pastebin.com/CXZUBHKr

The slave logs can be found at http://pastebin.com/CXZUBHKr

主日志可以在 http://pastebin.com/thYR1par

推荐答案

我遇到了类似的问题.我的奴隶日志会被填满

I had a similar problem. My slave logs would be filled with

    E0812 15:58:04.017990  2193 socket.hpp:107] Shutdown failed on fd=13: Transport endpoint is not connected [107]

我的主人会有

    F0120 20:45:48.025610 12116 master.cpp:1083] Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins

master 会死掉,并且会发生新的选举,被杀死的 master 将被 upstart 重新启动(我在 Centos 6 机器上)并被添加到潜在的 master 池中.因此,我选择的主节点将菊花链围绕我的主节点.多次重启 master 和 slave 无济于事,问题会在 master 选举后 1 分钟内持续返回.

And the master would die, and a new election would occur, the killed master would be restarted by upstart (I am on a Centos 6 box) and be added into the pool of potential masters. Thus my elected master would daisy chain around my master nodes. Many restarts of masters and slaves did nothing the problem would consistently return within 1 minute of master election.

我的解决方案来自这个 stackoverflow 问题(谢谢)和 github 中的提示 gist note.

The solution for me came from a this stackoverflow question (thanks) and a hint in a github gist note.

它的要点是 /etc/default/mesos-master 必须指定一个法定人数(对于 mesos master 的数量,它需要是正确的,在我的情况下是 3)

The gist of it is /etc/default/mesos-master must specify a quorum number (it needs to be correct for the number of mesos masters, in my case 3)

    MESOS_QUORUM=2

这对我来说似乎很奇怪,因为我在文件 /etc/mesos-master/quorum

This seems odd to me as I have the same information in the file /etc/mesos-master/quorum

但是我把它添加到/etc/default/mesos-master重启了mesos-masters和slave,问题还没有出现.

But I added it to /etc/default/mesos-master restarted the mesos-masters and slaves and the problem has not returned.

希望对你有帮助.

这篇关于传输端点未连接 - Mesos 从/主的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆