如何使用kubeadm和Weave使Kube-dns在Vagrant集群中工作 [英] How to get kube-dns working in Vagrant cluster using kubeadm and Weave
问题描述
我使用Vagrant部署了一些虚拟机来测试kubernetes:
主机:4个CPU,4GB RAM
节点1:4个CPU,8GB RAM
基本图像:Centos/7.
网络:桥接.
主机操作系统:Centos 7.2
I deployed a few VMs using Vagrant to test kubernetes:
master: 4 CPUs, 4GB RAM
node-1: 4 CPUs, 8GB RAM
Base image: Centos/7.
Networking: Bridged.
Host OS: Centos 7.2
按照 kubeadm入门指南,使用kubeadm部署kubernetes.将节点添加到群集并安装Weave Net之后,很遗憾,由于kube-dns处于ContainerCreating状态,因此无法启动和运行:
Deployed kubernetes using kubeadm by following kubeadm getting started guide. After adding the node to the cluster and installing Weave Net, I'm unfortunately not able to get kube-dns up and running as it stays in a ContainerCreating state:
[vagrant@master ~]$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-master 1/1 Running 0 1h
kube-system kube-apiserver-master 1/1 Running 0 1h
kube-system kube-controller-manager-master 1/1 Running 0 1h
kube-system kube-discovery-982812725-0tiiy 1/1 Running 0 1h
kube-system kube-dns-2247936740-46rcz 0/3 ContainerCreating 0 1h
kube-system kube-proxy-amd64-4d8s7 1/1 Running 0 1h
kube-system kube-proxy-amd64-sqea1 1/1 Running 0 1h
kube-system kube-scheduler-master 1/1 Running 0 1h
kube-system weave-net-h1om2 2/2 Running 0 1h
kube-system weave-net-khebq 1/2 CrashLoopBackOff 17 1h
我认为问题与驻留在节点1上的处于CrashloopBackoff状态的织网吊舱有关:
I assume the problem is somehow related to the weave-net pod in CrashloopBackoff state which resides on node-1:
[vagrant@master ~]$ kubectl describe pods --namespace=kube-system weave-net-khebq
Name: weave-net-khebq
Namespace: kube-system
Node: node-1/10.0.2.15
Start Time: Wed, 05 Oct 2016 07:10:39 +0000
Labels: name=weave-net
Status: Running
IP: 10.0.2.15
Controllers: DaemonSet/weave-net
Containers:
weave:
Container ID: docker://4976cd0ec6f971397aaf7fbfd746ca559322ab3d8f4ee217dd6c8bd3f6ed4f76
Image: weaveworks/weave-kube:1.7.0
Image ID: docker://sha256:1ac5304168bd9dd35c0ecaeb85d77d26c13a7d077aa8629b2a1b4e354cdffa1a
Port:
Command:
/home/weave/launch.sh
Requests:
cpu: 10m
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 05 Oct 2016 08:18:51 +0000
Finished: Wed, 05 Oct 2016 08:18:51 +0000
Ready: False
Restart Count: 18
Liveness: http-get http://127.0.0.1:6784/status delay=30s timeout=1s period=10s #success=1 #failure=3
Volume Mounts:
/etc from cni-conf (rw)
/host_home from cni-bin2 (rw)
/opt from cni-bin (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-kir36 (ro)
/weavedb from weavedb (rw)
Environment Variables:
WEAVE_VERSION: 1.7.0
weave-npc:
Container ID: docker://feef7e7436d2565182d99c9021958619f65aff591c576a0c240ac0adf9c66a0b
Image: weaveworks/weave-npc:1.7.0
Image ID: docker://sha256:4d7f0bd7c0e63517a675e352146af7687a206153e66bdb3d8c7caeb54802b16a
Port:
Requests:
cpu: 10m
State: Running
Started: Wed, 05 Oct 2016 07:11:04 +0000
Ready: True
Restart Count: 0
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-kir36 (ro)
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
weavedb:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
cni-bin:
Type: HostPath (bare host directory volume)
Path: /opt
cni-bin2:
Type: HostPath (bare host directory volume)
Path: /home
cni-conf:
Type: HostPath (bare host directory volume)
Path: /etc
default-token-kir36:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-kir36
QoS Class: Burstable
Tolerations: dedicated=master:Equal:NoSchedule
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1h 3m 19 {kubelet node-1} spec.containers{weave} Normal Pulling pulling image "weaveworks/weave-kube:1.7.0"
1h 3m 19 {kubelet node-1} spec.containers{weave} Normal Pulled Successfully pulled image "weaveworks/weave-kube:1.7.0"
55m 3m 11 {kubelet node-1} spec.containers{weave} Normal Created (events with common reason combined)
55m 3m 11 {kubelet node-1} spec.containers{weave} Normal Started (events with common reason combined)
1h 14s 328 {kubelet node-1} spec.containers{weave} Warning BackOff Back-off restarting failed docker container
1h 14s 300 {kubelet node-1} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "weave" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=weave pod=weave-net-khebq_kube-system(d1feb9c1-8aca-11e6-8d4f-525400c583ad)"
列出在node-1上运行的容器会给出
Listing the containers running on node-1 gives
[vagrant@node-1 ~]$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
feef7e7436d2 weaveworks/weave-npc:1.7.0 "/usr/bin/weave-npc" About an hour ago Up About an hour k8s_weave-npc.e6299282_weave-net-khebq_kube-system_d1feb9c1-8aca-11e6-8d4f-525400c583ad_0f0517cf
762cd80d491e gcr.io/google_containers/pause-amd64:3.0 "/pause" About an hour ago Up About an hour k8s_POD.d8dbe16c_weave-net-khebq_kube-system_d1feb9c1-8aca-11e6-8d4f-525400c583ad_cda766ac
8c3395959d0e gcr.io/google_containers/kube-proxy-amd64:v1.4.0 "/usr/local/bin/kube-" About an hour ago Up About an hour k8s_kube-proxy.64a0bb96_kube-proxy-amd64-4d8s7_kube-system_909e6ae1-8aca-11e6-8d4f-525400c583ad_48e7eb9a
d0fbb716bbf3 gcr.io/google_containers/pause-amd64:3.0 "/pause" About an hour ago Up About an hour k8s_POD.d8dbe16c_kube-proxy-amd64-4d8s7_kube-system_909e6ae1-8aca-11e6-8d4f-525400c583ad_d6b232ea
第一个容器的日志显示一些连接错误:
The logs for the first container show some connection errors:
[vagrant@node-1 ~]$ sudo docker logs feef7e7436d2
E1005 08:46:06.368703 1 reflector.go:214] /home/awh/workspace/weave-npc/cmd/weave-npc/main.go:154: Failed to list *api.Pod: Get https://100.64.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: connection refused
E1005 08:46:06.370119 1 reflector.go:214] /home/awh/workspace/weave-npc/cmd/weave-npc/main.go:155: Failed to list *extensions.NetworkPolicy: Get https://100.64.0.1:443/apis/extensions/v1beta1/networkpolicies?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: connection refused
E1005 08:46:06.473779 1 reflector.go:214] /home/awh/workspace/weave-npc/cmd/weave-npc/main.go:153: Failed to list *api.Namespace: Get https://100.64.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: connection refused
E1005 08:46:07.370451 1 reflector.go:214] /home/awh/workspace/weave-npc/cmd/weave-npc/main.go:154: Failed to list *api.Pod: Get https://100.64.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: connection refused
E1005 08:46:07.371308 1 reflector.go:214] /home/awh/workspace/weave-npc/cmd/weave-npc/main.go:155: Failed to list *extensions.NetworkPolicy: Get https://100.64.0.1:443/apis/extensions/v1beta1/networkpolicies?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: connection refused
E1005 08:46:07.474991 1 reflector.go:214] /home/awh/workspace/weave-npc/cmd/weave-npc/main.go:153: Failed to list *api.Namespace: Get https://100.64.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: connection refused
我缺乏使用Kubernetes和容器网络来进一步解决这些问题的经验,因此非常感谢一些提示. 观察:所有pod/节点报告其IP为10.0.2.15,这是本地Vagrant NAT地址,而不是VM的实际IP地址.
I lack the experience with kubernetes and container networking to troubleshoot these issues further, so some hints are very much appreciated. Observation: All pods/nodes report their IP as 10.0.2.15 which is the local Vagrant NAT address, not the actual IP address of the VMs.
推荐答案
以下是对我有用的食谱(截至2017年3月19日,使用Vagrant和VirtualBox).群集由3个节点组成,其中1个是主节点,2个是节点.
Here is the recipe that worked for me (as of March 19th 2017 using Vagrant and VirtualBox). The cluster is made of 3 nodes, 1 Master and 2 Nodes.
1)确保在init上显式设置主节点的IP
1) Make sure you explicitly set the IP of your master node on init
kubeadm init --api-advertise-addresses=10.30.3.41
2)手动或在配置期间,将每个节点的/etc/hosts
确切的IP添加到其配置中.这是您可以在Vagrant文件中添加的行(我使用的节点命名约定:k8node- $ i):
2) Manually or during provisioning, add to each node's /etc/hosts
the exact IP that you are configuring it to have. Here is a line you can add in your Vagrant file (node naming convention I use: k8node-$i) :
config.vm.provision :shell, inline: "sed 's/127\.0\.0\.1.*k8node.*/10.30.3.4#{i} k8node-#{i}/' -i /etc/hosts"
示例:
vagrant@k8node-1:~$ cat /etc/hosts
10.30.3.41 k8node-1
127.0.0.1 localhost
3)最后,所有节点都将尝试使用群集的公共IP连接到主节点(不确定为什么会这样...).这是解决方法.
3) Finally, all Nodes will try to use the public IP of the cluster to connect to the master (not sure why this is happening ...). Here is the fix for that.
首先,通过在master上运行以下命令来找到公共IP.
First, find the public IP by running the following on master.
kubectl get svc
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes 10.96.0.1 <none> 443/TCP 1h
在每个节点中,请确保使用10.96.0.1的任何进程(在我的情况下)都路由到10.30.3.41上的master.
In each node, make sure that any process using 10.96.0.1 (in my case) is routed to master that is on 10.30.3.41.
因此,在每个节点上(可以跳过主节点),请使用route
设置重定向.
So on each Node (you can skip master) use route
to set the redirect.
route add 10.96.0.1 gw 10.30.3.41
在那之后,一切都会正常:
After that, everything should work ok:
vagrant@k8node-1:~$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system dummy-2088944543-rnl2f 1/1 Running 0 1h
kube-system etcd-k8node-1 1/1 Running 0 1h
kube-system kube-apiserver-k8node-1 1/1 Running 0 1h
kube-system kube-controller-manager-k8node-1 1/1 Running 0 1h
kube-system kube-discovery-1769846148-g8g85 1/1 Running 0 1h
kube-system kube-dns-2924299975-7wwm6 4/4 Running 0 1h
kube-system kube-proxy-9dxsb 1/1 Running 0 46m
kube-system kube-proxy-nx63x 1/1 Running 0 1h
kube-system kube-proxy-q0466 1/1 Running 0 1h
kube-system kube-scheduler-k8node-1 1/1 Running 0 1h
kube-system weave-net-2nc8d 2/2 Running 0 46m
kube-system weave-net-2tphv 2/2 Running 0 1h
kube-system weave-net-mp6s0 2/2 Running 0 1h
vagrant@k8node-1:~$ kubectl get nodes
NAME STATUS AGE
k8node-1 Ready,master 1h
k8node-2 Ready 1h
k8node-3 Ready 48m
这篇关于如何使用kubeadm和Weave使Kube-dns在Vagrant集群中工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!