Docker 的 `docker0` 设备反复死机(`inet addr` 消失) [英] Docker's `docker0` device dies repeatedly (`inet addr` disappears)
问题描述
我在 Ubuntu 14.04 上运行 Docker 版本 1.4.1,构建 5bc2ff8
.当我 docker run
任何容器时,几分钟后我的 docker0
桥死",并且容器停止能够访问网络.在连接终止之前,运行 ifconfig
会报告一个带有 inet addr
的 docker0
设备,例如:
I'm running Docker version 1.4.1, build 5bc2ff8
on Ubuntu 14.04. When I docker run
any container, after a few minutes my docker0
bridge "dies", and the container stops being able to reach the network. Before the connection dies, running ifconfig
reports a docker0
device with an inet addr
like:
docker0 Link encap:Ethernet HWaddr 56:84:7a:fe:97:99
inet addr:172.17.42.1 Bcast:0.0.0.0 Mask:255.255.0.0
inet6 addr: xxxx::xxxx:xxxx:xxxx:xxxx/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
[... etc.]
但是连接死掉之后,ifconfig
显示ipv4地址已经消失了:
But after the connection dies, ifconfig
shows that the ipv4 address has gone away:
docker0 Link encap:Ethernet HWaddr 56:84:7a:fe:97:99
inet6 addr: xxxx::xxxx:xxxx:xxxx:xxxx/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:8116 errors:0 dropped:0 overruns:0 frame:0
TX packets:15995 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2444859 (2.4 MB) TX bytes:17440729 (17.4 MB)
重启 docker,例如使用 sudo service docker restart
,使设备恢复正常——但我所有的容器都死了,问题又重新开始了.我不能可靠地让任何东西一次运行超过几分钟.对于大多数项目,时间甚至不足以完成 docker build
.
Restarting docker, e.g. with sudo service docker restart
, brings the device back up -- but all my containers die and the problem starts over again. I can't reliably get anything to run for more than a few minutes at a time. Not long enough to even complete a docker build
for most projects.
- 可能是什么原因造成的?
- 如何诊断?
- 有哪些可能的解决方案?
谢谢!
更新:只需使用 docker run -t -i ubuntu/bin/bash
启动容器,然后使用ctrl-d
.当我这样做时,这就是我在 /var/log/syslog
Update: I can reliably trigger this docker0
-dropping behavior simply by starting a container with docker run -t -i ubuntu /bin/bash
, and then exiting with ctrl-d
. When I do so, here's what I see in /var/log/syslog
myhost kernel: docker0: port 1(veth80ddeaf) entered disabled state
myhost kernel: device veth80ddeaf left promiscuous mode
myhost kernel: docker0: port 1(veth80ddeaf) entered disabled state
'
myhost dhclient: Internet Systems Consortium DHCP Client 4.2.4
myhost dhclient: Copyright 2004-2012 Internet Systems Consortium.
myhost dhclient: All rights reserved.
myhost dhclient: For info, please visit https://www.isc.org/software/dhcp/
myhost dhclient:
myhost dhclient: Listening on LPF/docker0/56:84:7a:fe:97:99
myhost dhclient: Sending on LPF/docker0/56:84:7a:fe:97:99
myhost dhclient: Sending on Socket/fallback
myhost kernel: IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
<小时>
更新 #2:失败的频率似乎取决于容器运行的时间.例如:
Update #2: The frequency of failure seems to depend on how long the container runs. For example:
docker run -i -t ubuntu sleep 0
--> `docker0` "survives" ~100% of the time
docker run -i -t ubuntu sleep 1
--> `docker0` survives ~80% of the time
docker run -i -t ubuntu sleep 5
--> `docker0` survives ~0% of the time
推荐答案
如何诊断?
当 docker0
有一个 ip 地址时,如果你 不 启动任何容器,它会消失吗?如果它无限期地持续到您启动容器之前,我会先查看 Docker 日志,并在您启动容器时跟踪系统日志.
When docker0
has an ip address, does it go away if you don't start any containers? If it persists indefinitely until you start a container, I would start by looking at the Docker logs as well as tailing the system logs when you start a container.
IP 地址是否以设定的时间间隔(例如,每 N 分钟)消失?如果是这样,我会从 cron
中查找日志,看看是否有一些周期性任务负责.
Does the ip address disappear at set intervals (e.g., every N minutes)? If so, I would look for logs from cron
to see if some periodic task is responsible.
您正在运行 NetworkManager 吗?禁用 NetworkManager 是否会使问题消失?我在带有 NetworkManager 的系统上运行 Docker 没有问题,但是我的配置中设置了 no-auto-default=*
,这可能会对这类事情产生影响.
Are you running NetworkManager? Does disabling NetworkManager make the problem go away? I am running Docker on a system with NetworkManager without a problem, but I have no-auto-default=*
set in my config, which may have an impact on this sort of thing.
更新
这很可疑:
myhost dhclient: Internet Systems Consortium DHCP Client 4.2.4
myhost dhclient: Copyright 2004-2012 Internet Systems Consortium.
myhost dhclient: All rights reserved.
myhost dhclient: For info, please visit https://www.isc.org/software/dhcp/
myhost dhclient:
myhost dhclient: Listening on LPF/docker0/56:84:7a:fe:97:99
myhost dhclient: Sending on LPF/docker0/56:84:7a:fe:97:99
myhost dhclient: Sending on Socket/fallback
docker0
上不应有任何 dhclient
进程监听,这绝对是导致您的 IP 地址消失的原因.如果您没有在此接口上显式运行 dhcp 客户端,这确实表明 NetworkManager 实际上正在尝试管理此接口.您说您禁用了 NetworkManager,但您是否确认该进程已停止?监听 docker0
的 dhclient
的父进程是什么?如果您停止 dhclient
进程,它会重新启动吗?问题消失了吗?
There should not be any dhclient
process listening on docker0
, and this is absolutely what is causing your ip address to disappear. If you are not explicitly running a dhcp client on this interface, this really suggests that NetworkManager is in fact trying to manage this interface. You said you disabled NetworkManager, but did you confirm that the process was stopped? What is the parent process of the dhclient
that is listening on docker0
? If you stop the dhclient
process, does it get restarted? Does the problem go away?
这篇关于Docker 的 `docker0` 设备反复死机(`inet addr` 消失)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!