Kubernetes的容器创建被法兰绒卡住了(ContainerCreating) [英] Kubernetes' container creation gets stuck at container creation (ContainerCreating) with flannel

查看:151
本文介绍了Kubernetes的容器创建被法兰绒卡住了(ContainerCreating)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

上下文

我在

I installed Docker following this instruction on my Ubuntu 18.04 LTS (Server) and later on Kubernetes followed via kubeadm. After initializing (kubeadm init --pod-network-cidr=10.10.10.10/24) and joining a second node (I got a two node cluster for the start) I cannot get my coredns as well as the later applied Web UI (Dashboard) to actually go into status Running.

作为Pod Network,我尝试了 Flannel (kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml)和 Weave Net -没什么改变.即使经过数小时的等待,它仍然显示状态 ContainerCreating :

As pod network I tried both, Flannel (kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml) and Weave Net - Nothing changed. It still shows status ContainerCreating, even after hours of waiting:

问题

为什么容器创建不能按预期工作,这可能是根本原因?最重要的是:我该如何解决?

Why doesn't the container creation work as expected and what might be the root cause for this? And most importantly: How do I solve this?

修改

在下面总结我的答案,以下是原因:

Summing up my answer below, here are the reasons why:

  • Docker使用了cgroups而不是systemd
  • 我没有正确配置iptables
  • 我使用了错误的kubeadm init,因为法兰绒 standard-yaml要求--pod-network-cidr10.244.0.0/16
  • Docker used cgroups instead of systemd
  • I did not configure iptables correctly
  • I used a wrong kubeadm init since flannels standard-yaml requires --pod-network-cidr to be 10.244.0.0/16

推荐答案

由于回答这个问题花了我很多时间,所以我想分享一下让我受益匪浅的原因.可能有比必要更多的代码,但是如果我或其他人必须重做所有步骤,我也希望将此代码放在一个地方.

Since answering this questions took me a lot of time, I wanted to share what got me out of this. There might be some more code than necessary, but I also want this to be in one place if I or someone else has to redo all steps.



首先,这一切都始于Docker ...



First it all started with Docker...

我发现这大概是我安装 Docker 的方式开始的.按照链接的在线说明,我使用sudo apt-get install docker.io来安装 Docker ,并通过执行sudo usermod -aG docker $USER将它与cgroups一起使用.

I figured out that it presumably all started with the way I installed Docker. Following the linked online-instructions I used sudo apt-get install docker.io in order to install Docker and used it with cgroups by doing sudo usermod -aG docker $USER.

好吧,看看 Kubernetes 的官方说明,这是一个错误:systemd是推荐的操作方式!

Well, taking a look at the official instructions from Kubernetes this was a mistake: systemd is the recommended way to go!

因此,我通过遵循这些出色的说明,完全清除了我对Docker所做的一切 Mayur Bhandare:

So I completly purged all I ever did with docker by following these great instructions from Mayur Bhandare:

sudo apt-get purge -y docker-engine docker docker.io docker-ce  
sudo apt-get autoremove -y --purge docker-engine docker docker.io docker-ce  
sudo rm -rf /var/lib/docker /etc/docker
sudo rm /etc/apparmor.d/docker
sudo groupdel docker
sudo rm -rf /var/run/docker.sock

# Reboot to be sure

之后,我安装了重新安装的官方方式(请继续请注意,这种情况将来可能会改变):

Afterwards I installed reinstalled the official way (keep in mind that this might change in the future):

# Install Docker CE
## Set up the repository:
### Install packages to allow apt to use a repository over HTTPS
apt-get update && apt-get install -y \
  apt-transport-https ca-certificates curl software-properties-common gnupg2

### Add Docker’s official GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -

### Add Docker apt repository.
add-apt-repository \
  "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) \
  stable"

## Install Docker CE.
apt-get update && apt-get install -y \
  containerd.io=1.2.10-3 \
  docker-ce=5:19.03.4~3-0~ubuntu-$(lsb_release -cs) \
  docker-ce-cli=5:19.03.4~3-0~ubuntu-$(lsb_release -cs)

# Setup daemon.
cat > /etc/docker/daemon.json <<EOF
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2"
}
EOF

mkdir -p /etc/systemd/system/docker.service.d

# Restart docker.
systemctl daemon-reload
systemctl restart docker

请注意,这明确使用了systemd

Note that this explicitly uses systemd!



...然后是Flannel ...



... and then it went on with Flannel...

上面我写了sudo kubeadm init是用--pod-network-cidr=10.10.10.10/24完成的,因为后者是我主人的IP. 好吧,正如此处 所指出的,不要使用官方推荐 --pod-network-cidr=10.244.0.0/16会导致错误,例如使用kubectl proxy或使用提供的kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml时的容器创建. 这是由于10.244.0.0/16.yaml中是硬链接的,因此是强制性的-或者您只需在.yaml中对其进行更改.

Above I wrote my sudo kubeadm init was done with --pod-network-cidr=10.10.10.10/24 since the latter was the IP of my master. Well, as pointed out here not using the official recommended --pod-network-cidr=10.244.0.0/16 results in an error for example using kubectl proxy or the container-creation when using the provided kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml. This is due to the fact that 10.244.0.0/16 is hard-linked in the .yaml and, hence, mandatory - Or you just change it in the .yaml.

为了摆脱错误的配置,我进行了完全重置. 这可以使用sudo kubeadm reset并通过使用sudo rm -r ~/.kube/config删除配置来实现. 无论如何,由于我拧了这么多螺丝,所以我通过卸载并重新安装kubeadm并确保这次确实使用了iptables来进行了完全重置(我也之前没做过...).

In order to get rid of the false configuration I did a full reset. This can be achieved using sudo kubeadm reset and by deleting the config with sudo rm -r ~/.kube/config. Anyhow, since I screwed it so much, I did a full reset by uninstalling and reinstalling kubeadm and making sure it did use iptables this time (which I also forgot to do before...).

此处是一个很好的链接,介绍了如何完全卸载所有kubeadm部件.

Here is a nice link how to fully uninstall all kubeadm-parts.

kubeadm reset
sudo apt-get purge kubeadm kubectl kubelet kubernetes-cni kube*   
sudo apt-get autoremove  
sudo rm -rf ~/.kube

为了完整起见,这也是重新安装:

For the sake of completeness, here is the reinstall as well:

# ensure legacy binaries are installed
sudo apt-get install -y iptables arptables ebtables

# switch to legacy versions
sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
sudo update-alternatives --set arptables /usr/sbin/arptables-legacy
sudo update-alternatives --set ebtables /usr/sbin/ebtables-legacy

# Install Kubernetes with kubeadm
sudo apt-get update && sudo apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

#reboot



...终于奏效了!



... and finally it worked!

全新安装后,我执行了以下操作:

After the clean reinstallation I did the following:

# Initialize with correct cidr
sudo kubeadm init --pod-network-cidr=10.244.0.0/16
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml

然后被结果惊呆了:

kubectl get pods --all-namespaces

在网站上的注释:这也解决了在描述未创建的coredns时在执行这些步骤之前遇到的/run/flannel/subnet.env: no such file or directory-错误.

On a site note: This also resolved the /run/flannel/subnet.env: no such file or directory-error I encountered prior to these steps when describing the uncreated coredns.

这篇关于Kubernetes的容器创建被法兰绒卡住了(ContainerCreating)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆