当加入位于不同Docker容器的服务器节点时出现问题 [英] Issue when joining serf nodes located in different Docker containers

查看:2042
本文介绍了当加入位于不同Docker容器的服务器节点时出现问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

上下文:主机是AWS-EC2 / Ubuntu 14.04.5,Docker版本为17.05.0-ce。集装箱是从公开的repo图像构建的。 cbhihe / serf-alpine-bash 。所有容器位于相同的EC2实例上,并与net-interfacedocker0共享相同的默认网桥。



尝试加入节点serfDC1 d4fd90692e18)和serfDC2(id 6353e7f6134d),通过从主机的shell传递cmds:

  $ docker exec serfDC1 serf agent -node = Node1 -bind = 0.0.0.0:7946 
==>启动Serf代理...
==>启动Serf代理RPC ...
==> Serf代理运行!
节点名称:'d4fd90692e18'
Bind addr:'0.0.0.0:7946'
RPC addr:'127.0.0.1:7373'
加密:false
快照:false
简介:lan
==>日志数据现在将流入:
2017/06/04 00:01:10 [INFO]代理:Serf代理起始
2017/06/04 00:01:10 [INFO] serf :EventMemberJoin:d4fd90692e18 127.0.0.1
2017/06/04 00:01:11 [INFO]代理:收到的事件:member-join
^ C
/ pre>

发现Node1的容器的IP = 172.17.0.4后,我可以发出 serf agent -join cmd Node2:

  $ docker exec serfDC2 serf agent -node = Node2 -join = 172.17.0.4 
==>启动Serf代理...
==>启动Serf代理RPC ...
==> Serf代理运行!
节点名称:'6353e7f6134d'
Bind addr:'0.0.0.0:7946'
RPC addr:'127.0.0.1:7373'
加密:false
快照:false
简介:lan
==>加入集群...(重播:false)
加入完成。与1个初始代理人同步
==>日志数据现在将流入:
2017/06/04 00:18:35 [INFO]代理:Serf代理起始
2017/06/04 00:18:35 [INFO] serf :EventMemberJoin:6353e7f6134d 127.0.0.1
2017/06/04 00:18:35 [INFO]代理:加入:[172.17.0.4] replay:false
2017/06/04 00:18:35 [INFO] serf:EventMemberJoin:d4fd90692e18 127.0.0.1
2017/06/04 00:18:35 [INFO]代理:加入:1个节点
2017/06/04 00:18:36 [WARN ] memberlist:获取ping意外节点'd4fd90692e18'from = 127.0.0.1:7946
2017/06/04 00:18:36 [INFO]代理:收到事件:member-join
2017/06 / 04 00:18:37 [WARN] memberlist:得到ping意外节点d4fd90692e18 from = 127.0.0.1:34876
2017/06/04 00:18:37 [ERR] memberlist:失败的TCP回退ping:EOF
2017/06/04 00:18:37 [INFO]会员列表:Suspect d4fd90692e18失败了,没有收到
2017/06/04 00:18:38 [WARN]会员列表:得到ping意外节点'd4fd90692e18'from = 127.0.0.1:7946
2017/06/04 00:18:39 [WARN] memberlist:得到ping意外节点d4fd90692e18 from = 127.0.0.1:34879
2017/06/04 00:18:39 [ERR] memberlist:Failed TCP fallback ping:EOF
2017/06/04 00:18:40 [INFO]会员列表:Suspect d4fd90692e18失败了,没有收到
2017/06/04 00:18:41 [WARN]会员列表:Got ping意外节点'd4fd90692e18'from = 127.0.0.1:7946
2017/06/04 00:18:42 [WARN] memberlist:得到ping意外节点d4fd90692e18 from = 127.0.0.1:34881
2017/06/04 00:18:42 [ERR]会员列表:TCP回退失败ping:EOF
2017/06/04 00:18:42 [INFO] memberlist:将d4fd90692e18标记为失败,可疑超时(0同行确认)
2017/06/04 00:18:42 [INFO] serf:EventMemberFailed:d4fd90692e18 127.0.0.1
2017/06/04 00:18:43 [INFO]代理:收到事件:会员失败
2017/06/04 00:18:44 [INFO]会员列表:Suspect d4fd90692e18失败了,没有收到
2017/06/04 00:19:05 [IN FO] serf:尝试重新连接到d4fd90692e18 127.0.0.1:7946
^ C

导致无法加入,如下所示:

  $ docker exec serfDC2 serf members 
6353e7f6134d 127.0.0.1:7946 alive
d4fd90692e18 127.0.0.1:7946失败
$ docker exec serfDC1 serf会员
d4fd90692e18 127.0.0.1:7946活着
6353e7f6134d 127.0.0.1:7946失败

我现在已经有一段时间了,我的机智结束了,我应该转向哪里。 Hashicorp's和Docker的文档似乎并不涵盖不同容器中两名农奴代理人之间初步握手的这一方面。



有人可以告诉我哪里转错了? 真的,任何答案都会很棒。

解决方案

Serf节点需要使用可路由的地址宣布自己。在你的情况下,他们告诉对方:我是本地主机:...,所以每个人都试图回答本地主机,这是错误的,因为每个容器都有自己的本地主机。



有一个选项来配置代理使用 eth0 ip向网络中的其他节点发布广告: -iface 。那么你需要放弃 -bind 选项。那些端口是默认的,所以没有必要自定义。



所以,对于node1:

  serf agent -node = Node1 -iface = eth0 

而对于node2 :

  serf agent -node = Node2 -join = 172.17.0.2 -iface = eth0 

docs


-iface - 此标志可用于提供绑定界面。如果接口已知而不是地址,则可以使用-bind。


它对我有用:



Node1:

  ==>日志数据现在将流入:

2017/06/04 01:56:40 [INFO]代理:Serf代理起始
2017/06/04 01:56:40 [INFO] serf:EventMemberJoin:Node1 172.17.0.2
2017/06/04 01:56:41 [INFO]代理:收到的事件:member-join
2017/06/04 01:57:02 [INFO] serf:EventMemberJoin:Node2 172.17.0.3
2017/06/04 01:57:03 [INFO]代理:收到事件:member-join

Node2:

  ==>日志数据现在将流入:

2017/06/04 01:57:02 [INFO]代理:Serf代理起始
2017/06/04 01:57:02 [INFO] serf:EventMemberJoin:Node2 172.17.0.3
2017/06/04 01:57:02 [INFO]代理:加入:[172.17.0.2] replay:false
2017/06/04 01 :57:02 [INFO] serf:EventMemberJoin:Node1 172.17.0.2
2017/06/04 01:57:02 [INFO]代理:加入:1个节点
2017/06/04 01:57 :03 [INFO]代理:收到事件:会员加入






编辑



如果每个容器都在自己的VM(EC2实例)中,因为每个实例都有自己的docker网络不互连,您必须提供EC2实例IP并公开相应的端口。使用 -advertise


-advertise - 广告标志用于更改我们向群集中其他节点发布的地址。


Node1:

  serf agent -node = Node1 -iface = eth0 -advertise = INSTANCE_IP 

Node2:

  serf agent -node = Node2 -join = NODE1_INSTANCE_IP -iface = eth0 

请记住,公开 docker运行中的serf端口

  docker run -p 7946:7946(...命令的其余部分...)


Context: Host is AWS-EC2 / Ubuntu 14.04.5 with Docker version 17.05.0-ce. Containers are built from publicly available repo image cbhihe/serf-alpine-bash. All containers are located on the same EC2 instance and share the same default bridge network with net-interface "docker0".

Trying to join nodes serfDC1 (id d4fd90692e18) and serfDC2 (id 6353e7f6134d), by passing cmds from the host's shell:

$ docker exec serfDC1 serf agent -node=Node1 -bind=0.0.0.0:7946
==> Starting Serf agent…
==> Starting Serf agent RPC...
==> Serf agent running!
         Node name: 'd4fd90692e18'
         Bind addr: '0.0.0.0:7946'
          RPC addr: '127.0.0.1:7373'
         Encrypted: false
          Snapshot: false
           Profile: lan
==> Log data will now stream in as it occurs:
    2017/06/04 00:01:10 [INFO] agent: Serf agent starting
    2017/06/04 00:01:10 [INFO] serf: EventMemberJoin: d4fd90692e18 127.0.0.1
    2017/06/04 00:01:11 [INFO] agent: Received event: member-join
    ^C

After discovering Node1's container's IP=172.17.0.4, I can issue the serf agent -join cmd to Node2:

$ docker exec serfDC2 serf agent -node=Node2 -join=172.17.0.4
==> Starting Serf agent...
==> Starting Serf agent RPC...
==> Serf agent running!
         Node name: '6353e7f6134d'
         Bind addr: '0.0.0.0:7946'
          RPC addr: '127.0.0.1:7373'
         Encrypted: false
          Snapshot: false
           Profile: lan
==> Joining cluster...(replay: false)
    Join completed. Synced with 1 initial agents
==> Log data will now stream in as it occurs:
    2017/06/04 00:18:35 [INFO] agent: Serf agent starting
    2017/06/04 00:18:35 [INFO] serf: EventMemberJoin: 6353e7f6134d 127.0.0.1
    2017/06/04 00:18:35 [INFO] agent: joining: [172.17.0.4] replay: false
    2017/06/04 00:18:35 [INFO] serf: EventMemberJoin: d4fd90692e18 127.0.0.1
    2017/06/04 00:18:35 [INFO] agent: joined: 1 nodes
    2017/06/04 00:18:36 [WARN] memberlist: Got ping for unexpected node 'd4fd90692e18' from=127.0.0.1:7946
    2017/06/04 00:18:36 [INFO] agent: Received event: member-join
    2017/06/04 00:18:37 [WARN] memberlist: Got ping for unexpected node d4fd90692e18 from=127.0.0.1:34876
    2017/06/04 00:18:37 [ERR] memberlist: Failed TCP fallback ping: EOF
    2017/06/04 00:18:37 [INFO] memberlist: Suspect d4fd90692e18 has failed, no acks received
    2017/06/04 00:18:38 [WARN] memberlist: Got ping for unexpected node 'd4fd90692e18' from=127.0.0.1:7946
    2017/06/04 00:18:39 [WARN] memberlist: Got ping for unexpected node d4fd90692e18 from=127.0.0.1:34879
    2017/06/04 00:18:39 [ERR] memberlist: Failed TCP fallback ping: EOF
    2017/06/04 00:18:40 [INFO] memberlist: Suspect d4fd90692e18 has failed, no acks received
    2017/06/04 00:18:41 [WARN] memberlist: Got ping for unexpected node 'd4fd90692e18' from=127.0.0.1:7946
    2017/06/04 00:18:42 [WARN] memberlist: Got ping for unexpected node d4fd90692e18 from=127.0.0.1:34881
    2017/06/04 00:18:42 [ERR] memberlist: Failed TCP fallback ping: EOF
    2017/06/04 00:18:42 [INFO] memberlist: Marking d4fd90692e18 as failed, suspect timeout reached (0 peer confirmations)
    2017/06/04 00:18:42 [INFO] serf: EventMemberFailed: d4fd90692e18 127.0.0.1
    2017/06/04 00:18:43 [INFO] agent: Received event: member-failed
    2017/06/04 00:18:44 [INFO] memberlist: Suspect d4fd90692e18 has failed, no acks received
    2017/06/04 00:19:05 [INFO] serf: attempting reconnect to d4fd90692e18 127.0.0.1:7946
   ^C

Resulted in failure to join as shown by:

$ docker exec serfDC2 serf members
6353e7f6134d  127.0.0.1:7946  alive
d4fd90692e18  127.0.0.1:7946  failed  
$ docker exec serfDC1 serf members
d4fd90692e18  127.0.0.1:7946  alive 
6353e7f6134d  127.0.0.1:7946  failed

I have been at this for quite some time now and am at my wit's end as to where I should turn. Hashicorp's and Docker's documentation do not seem to cover this aspect of the initial handshake between two serf agents in different containers.

Could somebody show me where I took a wrong turn ? Any answer would be great, really. Tx.

解决方案

Serf nodes need to 'announce' themselves with a routable address. In your case they are telling to each other: 'hi, I'm localhost:...', so each one tries to answer to localhost, which is something wrong because each container has its own localhost.

There is an option to configure the agent to use the eth0 ip to advertise to the others nodes in the network: -iface. Then you need to discard the -bind option. Those ports are default so there is no need to customize.

So, for the node1:

serf agent -node=Node1 -iface=eth0

And for the node2:

serf agent -node=Node2 -join=172.17.0.2 -iface=eth0

From docs:

-iface - This flag can be used to provide a binding interface. It can be used instead of -bind if the interface is known but not the address.

It's working properly for me:

Node1:

==> Log data will now stream in as it occurs:

    2017/06/04 01:56:40 [INFO] agent: Serf agent starting
    2017/06/04 01:56:40 [INFO] serf: EventMemberJoin: Node1 172.17.0.2
    2017/06/04 01:56:41 [INFO] agent: Received event: member-join
    2017/06/04 01:57:02 [INFO] serf: EventMemberJoin: Node2 172.17.0.3
    2017/06/04 01:57:03 [INFO] agent: Received event: member-join

Node2:

==> Log data will now stream in as it occurs:

    2017/06/04 01:57:02 [INFO] agent: Serf agent starting
    2017/06/04 01:57:02 [INFO] serf: EventMemberJoin: Node2 172.17.0.3
    2017/06/04 01:57:02 [INFO] agent: joining: [172.17.0.2] replay: false
    2017/06/04 01:57:02 [INFO] serf: EventMemberJoin: Node1 172.17.0.2
    2017/06/04 01:57:02 [INFO] agent: joined: 1 nodes
    2017/06/04 01:57:03 [INFO] agent: Received event: member-join


Edit:

In the case that each container is in its own VM (EC2 instance), as each instance has its own docker network and not interconnected, you have to provide the EC2 instance IP and expose the corresponding ports. Use -advertise

-advertise - The advertise flag is used to change the address that we advertise to other nodes in the cluster.

Node1:

serf agent -node=Node1 -iface=eth0 -advertise=INSTANCE_IP

Node2:

serf agent -node=Node2 -join=NODE1_INSTANCE_IP -iface=eth0

And remember to expose the serf port in docker run

docker run -p 7946:7946 (...rest of the command...)

这篇关于当加入位于不同Docker容器的服务器节点时出现问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆