传输端点未连接 - Mesos 从/主 [英] Transport Endpoint Not Connected - Mesos Slave / Master
问题描述
我正在尝试将 Mesos 从站连接到其主站.每当从站尝试连接到主站时,我都会收到以下消息:
I'm trying to connect a Mesos slave to its master. Whenver the slave tries to connect to the master, I get the following message:
I0806 16:39:59.090845 935 hierarchical.hpp:528] Added slave 20150806-163941-1027506442-5050-921-S3 (debian) with cpus(*):1; mem(*):1938; disk(*):3777; ports(*):[31000-32000] (allocated: )
E0806 16:39:59.091384 940 socket.hpp:107] Shutdown failed on fd=25: Transport endpoint is not connected [107]
I0806 16:39:59.091508 940 master.cpp:3395] Registered slave 20150806-163941-1027506442-5050-921-S3 at slave(1)@127.0.1.1:5051 (debian) with cpus(*):1; mem(*):1938; disk(*):3777; ports(*):[31000-32000]
I0806 16:39:59.091747 940 master.cpp:1006] Slave 20150806-163941-1027506442-5050-921-S3 at slave(1)@127.0.1.1:5051 (debian) disconnected
I0806 16:39:59.091868 940 master.cpp:2203] Disconnecting slave 20150806-163941-1027506442-5050-921-S3 at slave(1)@127.0.1.1:5051 (debian)
I0806 16:39:59.092031 940 master.cpp:2222] Deactivating slave 20150806-163941-1027506442-5050-921-S3 at slave(1)@127.0.1.1:5051 (debian)
I0806 16:39:59.092248 939 hierarchical.hpp:621] Slave 20150806-163941-1027506442-5050-921-S3 deactivated
错误似乎是:
E0806 16:39:59.091384 940 socket.hpp:107] fd=25 关闭失败:传输端点未连接 [107]
主机开始使用:
./mesos-master.sh --ip=10.129.62.61 --work_dir=~/Mesos/mesos-0.23.0/workdir/ --zk=zk://10.129.62.61:2181/mesos --quorum=1
还有奴隶
./mesos-slave.sh --master=zk://10.129.62.61:2181/mesos
如果我在与主机相同的 VM 上运行从属服务器,它就可以正常工作.
If I run the slave on the same VM as the host it's working fine.
我在互联网上找不到太多信息.我在 VirtualBox 5 上运行了两个虚拟机(Debian 8.1).主机是 windows 7.
I couldn't find much information on the internet. I'm running two virtual boxes (Debian 8.1) on VirtualBox 5. The host is a windows 7.
编辑 1:
主从都运行在一个专用的虚拟机上.
The master and the slave both run on a dedicated VM.
两个虚拟机 nextorks 均使用桥接网络进行配置.
Both VMs nextorks are configured using bridged network.
来自主服务器的ifconfig:
ifconfig from master:
eth0 Link encap:Ethernet HWaddr 08:00:27:cc:6c:6e
inet addr:10.129.62.61 Bcast:10.129.255.255 Mask:255.255.0.0
inet6 addr: fe80::a00:27ff:fecc:6c6e/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:5335953 errors:0 dropped:0 overruns:0 frame:0
TX packets:1422428 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:595886271 (568.2 MiB) TX bytes:362423868 (345.6 MiB)
来自从站的ifconfig:
ifconfig from slave:
eth0 Link encap:Ethernet HWaddr 08:00:27:56:83:20
inet addr:10.129.62.49 Bcast:10.129.255.255 Mask:255.255.0.0
inet6 addr: fe80::a00:27ff:fe56:8320/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:4358561 errors:0 dropped:0 overruns:0 frame:0
TX packets:3825 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:397126834 (378.7 MiB) TX bytes:354116 (345.8 KiB)
编辑 2:
slave 日志可以在 http://pastebin.com/CXZUBHKr
The slave logs can be found at http://pastebin.com/CXZUBHKr
主日志可以在 http://pastebin.com/thYR1par
推荐答案
我遇到了类似的问题.我的奴隶日志会被填满
I had a similar problem. My slave logs would be filled with
E0812 15:58:04.017990 2193 socket.hpp:107] Shutdown failed on fd=13: Transport endpoint is not connected [107]
我的主人会有
F0120 20:45:48.025610 12116 master.cpp:1083] Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins
master 会死掉,并且会发生新的选举,被杀死的 master 将被 upstart 重新启动(我在 Centos 6 机器上)并被添加到潜在的 master 池中.因此,我选择的主节点将菊花链围绕我的主节点.多次重启 master 和 slave 无济于事,问题会在 master 选举后 1 分钟内持续返回.
And the master would die, and a new election would occur, the killed master would be restarted by upstart (I am on a Centos 6 box) and be added into the pool of potential masters. Thus my elected master would daisy chain around my master nodes. Many restarts of masters and slaves did nothing the problem would consistently return within 1 minute of master election.
我的解决方案来自这个 stackoverflow 问题(谢谢)和 github 中的提示 gist note.
The solution for me came from a this stackoverflow question (thanks) and a hint in a github gist note.
它的要点是 /etc/default/mesos-master
必须指定一个法定人数(对于 mesos master 的数量,它需要是正确的,在我的情况下是 3)
The gist of it is /etc/default/mesos-master
must specify a quorum number (it needs to be correct for the number of mesos masters, in my case 3)
MESOS_QUORUM=2
这对我来说似乎很奇怪,因为我在文件 /etc/mesos-master/quorum
This seems odd to me as I have the same information in the file /etc/mesos-master/quorum
但是我把它添加到/etc/default/mesos-master
重启了mesos-masters和slave,问题还没有出现.
But I added it to /etc/default/mesos-master
restarted the mesos-masters and slaves and the problem has not returned.
希望对你有帮助.
这篇关于传输端点未连接 - Mesos 从/主的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!