如何使用 kubeadm 和 Weave 让 kube-dns 在 Vagrant 集群中工作 [英] How to get kube-dns working in Vagrant cluster using kubeadm and Weave

查看:35
本文介绍了如何使用 kubeadm 和 Weave 让 kube-dns 在 Vagrant 集群中工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 Vagrant 部署了几个虚拟机来测试 kubernetes:
主机:4 个 CPU,4GB 内存
节点 1:4 个 CPU,8GB 内存
基础镜像:Centos/7.
网络:桥接.
主机操作系统:Centos 7.2

按照 kubeadm 入门指南使用 kubeadm 部署 kubernetes.将节点添加到集群并安装 Weave Net 后,很遗憾我无法启动并运行 kube-dns,因为它处于 ContainerCreating 状态:

<块引用>

[vagrant@master ~]$ kubectl get pods --all-namespacesNAMESPACE NAME READY STATUS RESTARTS 年龄kube-system etcd-master 1/1 运行 0 1hkube-system kube-apiserver-master 1/1 运行 0 1hkube-system kube-controller-manager-master 1/1 运行 0 1hkube-system kube-discovery-982812725-0tiiy 1/1 运行 0 1hkube-system kube-dns-2247936740-46rcz 0/3 ContainerCreating 0 1hkube-system kube-proxy-amd64-4d8s7 1/1 运行 0 1hkube-system kube-proxy-amd64-sqea1 1/1 运行 0 1hkube-system kube-scheduler-master 1/1 运行 0 1hkube-system weave-net-h1om2 2/2 运行 0 1hkube-system weave-net-khebq 1/2 CrashLoopBackOff 17 1h

我认为问题与位于节点 1 上的处于 CrashloopBackoff 状态的 weave-net pod 有某种关系:

[vagrant@master ~]$ kubectl 描述 pods --namespace=kube-system weave-net-khebq名称:weave-net-khebq命名空间:kube-system节点:node-1/10.0.2.15开始时间:2016 年 10 月 5 日,星期三 07:10:39 +0000标签:名称=编织网状态:正在运行IP:10.0.2.15控制器:DaemonSet/weave-net容器:编织:容器 ID:docker://4976cd0ec6f971397aaf7fbfd746ca559322ab3d8f4ee217dd6c8bd3f6ed4f76图片:weaveworks/weave-kube:1.7.0图片 ID:docker://sha256:1ac5304168bd9dd35c0ecaeb85d77d26c13a7d077aa8629b2a1b4e354cdffa1a港口:命令:/home/weave/launch.sh要求:中央处理器:10m状态:等待原因: CrashLoopBackOff最后状态:已终止原因:错误退出代码:1开始时间:2016 年 10 月 5 日,星期三 08:18:51 +0000完成时间:2016 年 10 月 5 日,星期三 08:18:51 +0000准备:假重启次数:18活跃度:http-get http://127.0.0.1:6784/status delay=30s timeout=1s period=10s #success=1 #failure=3卷安装:来自 cni-conf (rw) 的/etc/host_home 来自 cni-bin2 (rw)/opt 来自 cni-bin (rw)/var/run/secrets/kubernetes.io/serviceaccount from default-token-kir36 (ro)/weavedb 来自 weavedb (rw)环境变量:WEAVE_VERSION:1.7.0编织npc:容器 ID:docker://feef7e7436d2565182d99c9021958619f65aff591c576a0c240ac0adf9c66a0b图片:weaveworks/weave-npc:1.7.0图片 ID:docker://sha256:4d7f0bd7c0e63517a675e352146af7687a206153e66bdb3d8c7caeb54802b16a港口:要求:中央处理器:10m状态:正在运行开始时间:2016 年 10 月 5 日,星期三 07:11:04 +0000准备好:真的重启次数:0卷安装:/var/run/secrets/kubernetes.io/serviceaccount from default-token-kir36 (ro)环境变量:<none>条件:类型状态初始化为真准备假PodScheduled True卷:编织数据库:类型:EmptyDir(共享 Pod 生命周期的临时目录)中等的:cni-bin:类型:HostPath(裸主机目录卷)路径:/选择cni-bin2:类型:HostPath(裸主机目录卷)路径:/homecni-conf:类型:HostPath(裸主机目录卷)路径:/etc默认令牌kir36:类型:秘密(由秘密填充的卷)秘密名称:default-token-kir36QoS 等级:突发容忍度:dedicated=master:Equal:NoSchedule事件:来自子对象路径类型原因消息的 FirstSeen LastSeen 计数--------- -------- ----- ---- ------------- -------- ------ -------1h 3m 19 {kubelet node-1} spec.containers{weave} Normal Pulling pull image "weaveworks/weave-kube:1.7.0"1h 3m 19 {kubelet node-1} spec.containers{weave} Normal Pulled 成功拉取镜像weaveworks/weave-kube:1.7.0"55m 3m 11 {kubelet node-1} spec.containers{weave} Normal Created(具有共同原因的事件组合)55m 3m 11 {kubelet node-1} spec.containers{weave} Normal Started(具有共同原因的事件组合)1h 14s 328 {kubelet node-1} spec.containers{weave} 警告 BackOff Back-off 重启失败的 docker 容器1h 14s 300 {kubelet node-1} 警告 FailedSync 同步 pod 时出错,跳过:无法使用 CrashLoopBackOff 为weave"设置StartContainer":后退 5m0s 重新启动失败的容器=weave pod=weave-net-khebq_kube-system(d1feb9c1-8aca-11e6-8d4f-525400c583ad)"

列出在 node-1 上运行的容器给出

[vagrant@node-1 ~]$ sudo docker ps容器 ID 图像命令创建状态端口名称fef7e7436d2 weaveworks/weave-npc:1.7.0 "/usr/bin/weave-npc" 大约一个小时前 Up 大约一个小时 k8s_weave-npc.e6299282_weave-net-khebq_kube-system_d1feb9c1-8aca-11e6-50f0c50c50c140fc50c50c762cd80d491e gcr.io/google_containers/pause-amd64:3.0 "/pause" 大约一个小时前 大约一个小时 k8s_POD.d8dbe16c_weave-net-khebq_kube-system_d1feb9c1-8aca-11e6-85640dc_11e6-85640dc8c3395959D-11e6-8d4f-525400c583ad_48e7eb9ad0fbb716bbf3 gcr.io/google_containers/pause-amd64:3.0 "/pause" 大约一个小时前 大约一个小时 k8s_POD.d8dbe16c_kube-proxy-amd64-4d8s7_kube-system_909e6ae1-8aca-52340e1-8aca-52340dc

第一个容器的日志显示了一些连接错误:

[vagrant@node-1 ~]$ sudo docker 日志feef7e7436d2E1005 08:46:06.368703 1 reflector.go:214]/home/awh/workspace/weave-npc/cmd/weave-npc/main.go:154: 无法列出 *api.Pod: 获取 https://100.64.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: 连接被拒绝E1005 08:46:06.370119 1 reflector.go:214]/home/awh/workspace/weave-npc/cmd/weave-npc/main.go:155: 无法列出 *extensions.NetworkPolicy: 获取 https://100.64.0.1:443/apis/extensions/v1beta1/networkpolicies?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: 连接被拒绝E1005 08:46:06.473779 1 reflector.go:214]/home/awh/workspace/weave-npc/cmd/weave-npc/main.go:153:无法列出 *api.Namespace:获取 https://100.64.0.1:443/api/v1/namespaces?resourceVersion=0:拨号 tcp 100.64.0.1:443:getsockopt:连接被拒绝E1005 08:46:07.370451 1 reflector.go:214]/home/awh/workspace/weave-npc/cmd/weave-npc/main.go:154:无法列出 *api.Pod:获取 https://100.64.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: 连接被拒绝E1005 08:46:07.371308 1 reflector.go:214]/home/awh/workspace/weave-npc/cmd/weave-npc/main.go:155:无法列出 *extensions.NetworkPolicy:获取 https://100.64.0.1:443/apis/extensions/v1beta1/networkpolicies?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: 连接被拒绝E1005 08:46:07.474991 1 reflector.go:214]/home/awh/workspace/weave-npc/cmd/weave-npc/main.go:153:无法列出 *api.Namespace:获取 https://100.64.0.1:443/api/v1/namespaces?resourceVersion=0:拨号 tcp 100.64.0.1:443:getsockopt:连接被拒绝

我缺乏使用 kubernetes 和容器网络来进一步解决这些问题的经验,因此非常感谢一些提示.观察:所有 pods/nodes 报告他们的 IP 为 10.0.2.15,这是本地 Vagrant NAT 地址,而不是虚拟机的实际 IP 地址.

解决方案

这是对我有用的方法(截至 2017 年 3 月 19 日,使用 Vagrant 和 VirtualBox).集群由 3 个节点组成,1 个 Master 和 2 个节点.

1) 确保您在 init 上明确设置了主节点的 IP

kubeadm init --api-advertise-addresses=10.30.3.41

2) 手动或在配置期间,将您正在配置的确切 IP 添加到每个节点的 /etc/hosts.这是您可以在 Vagrant 文件中添加的一行(我使用的节点命名约定:k8node-$i):

config.vm.provision :shell, inline: "sed 's/127.0.0.1.*k8node.*/10.30.3.4#{i} k8node-#{i}/' -i/etc/hosts"

示例:

vagrant@k8node-1:~$ cat/etc/hosts10.30.3.41 k8node-1127.0.0.1 本地主机

3) 最后,所有节点都会尝试使用集群的公共 IP 连接到主节点(不知道为什么会这样......).这是解决这个问题的方法.

首先,通过在 master 上运行以下命令找到公共 IP.

kubectl 获取 svc名称 CLUSTER-IP EXTERNAL-IP PORT(S) AGEKubernetes 10.96.0.1 <无>443/TCP 1小时

在每个节点中,确保任何使用 10.96.0.1(在我的例子中)的进程都路由到 10.30.3.41 上的主节点.

所以在每个节点上(你可以跳过master)使用route来设置重定向.

route add 10.96.0.1 gw 1​​0.30.3.41

在那之后,一切都应该正常了:

vagrant@k8node-1:~$ kubectl get pods --all-namespacesNAMESPACE NAME READY STATUS RESTARTS 年龄kube-system dummy-2088944543-rnl2f 1/1 运行 0 1hkube-system etcd-k8node-1 1/1 运行 0 1hkube-system kube-apiserver-k8node-1 1/1 运行 0 1hkube-system kube-controller-manager-k8node-1 1/1 运行 0 1hkube-system kube-discovery-1769846148-g8g85 1/1 运行 0 1hkube 系统 kube-dns-2924299975-7wwm6 4/4 运行 0 1hkube-system kube-proxy-9dxsb 1/1 运行 0 46mkube-system kube-proxy-nx63x 1/1 运行 0 1hkube-system kube-proxy-q0466 1/1 运行 0 1hkube-system kube-scheduler-k8node-1 1/1 运行 0 1hkube-system weave-net-2nc8d 2/2 运行 0 46mkube-system weave-net-2tphv 2/2 运行 0 1hkube-system weave-net-mp6s0 2/2 运行 0 1hvagrant@k8node-1:~$ kubectl 获取节点姓名 身份 年龄k8node-1 Ready,master 1hk8node-2 就绪 1 小时k8node-3 就绪 48m

I deployed a few VMs using Vagrant to test kubernetes:
master: 4 CPUs, 4GB RAM
node-1: 4 CPUs, 8GB RAM
Base image: Centos/7.
Networking: Bridged.
Host OS: Centos 7.2

Deployed kubernetes using kubeadm by following kubeadm getting started guide. After adding the node to the cluster and installing Weave Net, I'm unfortunately not able to get kube-dns up and running as it stays in a ContainerCreating state:

[vagrant@master ~]$ kubectl get pods --all-namespaces
NAMESPACE     NAME                             READY     STATUS              RESTARTS   AGE
kube-system   etcd-master                      1/1       Running             0          1h
kube-system   kube-apiserver-master            1/1       Running             0          1h
kube-system   kube-controller-manager-master   1/1       Running             0          1h
kube-system   kube-discovery-982812725-0tiiy   1/1       Running             0          1h
kube-system   kube-dns-2247936740-46rcz        0/3       ContainerCreating   0          1h
kube-system   kube-proxy-amd64-4d8s7           1/1       Running             0          1h
kube-system   kube-proxy-amd64-sqea1           1/1       Running             0          1h
kube-system   kube-scheduler-master            1/1       Running             0          1h
kube-system   weave-net-h1om2                  2/2       Running             0          1h
kube-system   weave-net-khebq                  1/2       CrashLoopBackOff    17         1h

I assume the problem is somehow related to the weave-net pod in CrashloopBackoff state which resides on node-1:

[vagrant@master ~]$ kubectl describe pods --namespace=kube-system weave-net-khebq
Name:       weave-net-khebq
Namespace:  kube-system
Node:       node-1/10.0.2.15
Start Time: Wed, 05 Oct 2016 07:10:39 +0000
Labels:     name=weave-net
Status:     Running
IP:     10.0.2.15
Controllers:    DaemonSet/weave-net
Containers:
  weave:
    Container ID:   docker://4976cd0ec6f971397aaf7fbfd746ca559322ab3d8f4ee217dd6c8bd3f6ed4f76
    Image:      weaveworks/weave-kube:1.7.0
    Image ID:       docker://sha256:1ac5304168bd9dd35c0ecaeb85d77d26c13a7d077aa8629b2a1b4e354cdffa1a
    Port:       
    Command:
      /home/weave/launch.sh
    Requests:
      cpu:      10m
    State:      Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 05 Oct 2016 08:18:51 +0000
      Finished:     Wed, 05 Oct 2016 08:18:51 +0000
    Ready:      False
    Restart Count:  18
    Liveness:       http-get http://127.0.0.1:6784/status delay=30s timeout=1s period=10s #success=1 #failure=3
    Volume Mounts:
      /etc from cni-conf (rw)
      /host_home from cni-bin2 (rw)
      /opt from cni-bin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-kir36 (ro)
      /weavedb from weavedb (rw)
    Environment Variables:
      WEAVE_VERSION:    1.7.0
  weave-npc:
    Container ID:   docker://feef7e7436d2565182d99c9021958619f65aff591c576a0c240ac0adf9c66a0b
    Image:      weaveworks/weave-npc:1.7.0
    Image ID:       docker://sha256:4d7f0bd7c0e63517a675e352146af7687a206153e66bdb3d8c7caeb54802b16a
    Port:       
    Requests:
      cpu:      10m
    State:      Running
      Started:      Wed, 05 Oct 2016 07:11:04 +0000
    Ready:      True
    Restart Count:  0
    Volume Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-kir36 (ro)
    Environment Variables:  <none>
Conditions:
  Type      Status
  Initialized   True 
  Ready     False 
  PodScheduled  True 
Volumes:
  weavedb:
    Type:   EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium: 
  cni-bin:
    Type:   HostPath (bare host directory volume)
    Path:   /opt
  cni-bin2:
    Type:   HostPath (bare host directory volume)
    Path:   /home
  cni-conf:
    Type:   HostPath (bare host directory volume)
    Path:   /etc
  default-token-kir36:
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-kir36
QoS Class:  Burstable
Tolerations:    dedicated=master:Equal:NoSchedule
Events:
  FirstSeen LastSeen    Count   From            SubobjectPath       Type        Reason      Message
  --------- --------    -----   ----            -------------       --------    ------      -------
  1h        3m      19  {kubelet node-1}    spec.containers{weave}  Normal      Pulling     pulling image "weaveworks/weave-kube:1.7.0"
  1h        3m      19  {kubelet node-1}    spec.containers{weave}  Normal      Pulled      Successfully pulled image "weaveworks/weave-kube:1.7.0"
  55m       3m      11  {kubelet node-1}    spec.containers{weave}  Normal      Created     (events with common reason combined)
  55m       3m      11  {kubelet node-1}    spec.containers{weave}  Normal      Started     (events with common reason combined)
  1h        14s     328 {kubelet node-1}    spec.containers{weave}  Warning     BackOff     Back-off restarting failed docker container
  1h        14s     300 {kubelet node-1}                Warning     FailedSync  Error syncing pod, skipping: failed to "StartContainer" for "weave" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=weave pod=weave-net-khebq_kube-system(d1feb9c1-8aca-11e6-8d4f-525400c583ad)"

Listing the containers running on node-1 gives

[vagrant@node-1 ~]$ sudo docker ps
CONTAINER ID        IMAGE                                              COMMAND                  CREATED             STATUS              PORTS               NAMES
feef7e7436d2        weaveworks/weave-npc:1.7.0                         "/usr/bin/weave-npc"     About an hour ago   Up About an hour                        k8s_weave-npc.e6299282_weave-net-khebq_kube-system_d1feb9c1-8aca-11e6-8d4f-525400c583ad_0f0517cf
762cd80d491e        gcr.io/google_containers/pause-amd64:3.0           "/pause"                 About an hour ago   Up About an hour                        k8s_POD.d8dbe16c_weave-net-khebq_kube-system_d1feb9c1-8aca-11e6-8d4f-525400c583ad_cda766ac
8c3395959d0e        gcr.io/google_containers/kube-proxy-amd64:v1.4.0   "/usr/local/bin/kube-"   About an hour ago   Up About an hour                        k8s_kube-proxy.64a0bb96_kube-proxy-amd64-4d8s7_kube-system_909e6ae1-8aca-11e6-8d4f-525400c583ad_48e7eb9a
d0fbb716bbf3        gcr.io/google_containers/pause-amd64:3.0           "/pause"                 About an hour ago   Up About an hour                        k8s_POD.d8dbe16c_kube-proxy-amd64-4d8s7_kube-system_909e6ae1-8aca-11e6-8d4f-525400c583ad_d6b232ea

The logs for the first container show some connection errors:

[vagrant@node-1 ~]$ sudo docker logs feef7e7436d2
E1005 08:46:06.368703       1 reflector.go:214] /home/awh/workspace/weave-npc/cmd/weave-npc/main.go:154: Failed to list *api.Pod: Get https://100.64.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: connection refused
E1005 08:46:06.370119       1 reflector.go:214] /home/awh/workspace/weave-npc/cmd/weave-npc/main.go:155: Failed to list *extensions.NetworkPolicy: Get https://100.64.0.1:443/apis/extensions/v1beta1/networkpolicies?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: connection refused
E1005 08:46:06.473779       1 reflector.go:214] /home/awh/workspace/weave-npc/cmd/weave-npc/main.go:153: Failed to list *api.Namespace: Get https://100.64.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: connection refused
E1005 08:46:07.370451       1 reflector.go:214] /home/awh/workspace/weave-npc/cmd/weave-npc/main.go:154: Failed to list *api.Pod: Get https://100.64.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: connection refused
E1005 08:46:07.371308       1 reflector.go:214] /home/awh/workspace/weave-npc/cmd/weave-npc/main.go:155: Failed to list *extensions.NetworkPolicy: Get https://100.64.0.1:443/apis/extensions/v1beta1/networkpolicies?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: connection refused
E1005 08:46:07.474991       1 reflector.go:214] /home/awh/workspace/weave-npc/cmd/weave-npc/main.go:153: Failed to list *api.Namespace: Get https://100.64.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 100.64.0.1:443: getsockopt: connection refused

I lack the experience with kubernetes and container networking to troubleshoot these issues further, so some hints are very much appreciated. Observation: All pods/nodes report their IP as 10.0.2.15 which is the local Vagrant NAT address, not the actual IP address of the VMs.

解决方案

Here is the recipe that worked for me (as of March 19th 2017 using Vagrant and VirtualBox). The cluster is made of 3 nodes, 1 Master and 2 Nodes.

1) Make sure you explicitly set the IP of your master node on init

kubeadm init --api-advertise-addresses=10.30.3.41

2) Manually or during provisioning, add to each node's /etc/hosts the exact IP that you are configuring it to have. Here is a line you can add in your Vagrant file (node naming convention I use: k8node-$i) :

config.vm.provision :shell, inline: "sed 's/127.0.0.1.*k8node.*/10.30.3.4#{i} k8node-#{i}/' -i /etc/hosts"

Example:

vagrant@k8node-1:~$ cat /etc/hosts
10.30.3.41 k8node-1
127.0.0.1   localhost

3) Finally, all Nodes will try to use the public IP of the cluster to connect to the master (not sure why this is happening ...). Here is the fix for that.

First, find the public IP by running the following on master.

kubectl get svc
NAME         CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   10.96.0.1    <none>        443/TCP   1h

In each node, make sure that any process using 10.96.0.1 (in my case) is routed to master that is on 10.30.3.41.

So on each Node (you can skip master) use route to set the redirect.

route add 10.96.0.1 gw 10.30.3.41

After that, everything should work ok:

vagrant@k8node-1:~$ kubectl get pods --all-namespaces
NAMESPACE     NAME                               READY     STATUS    RESTARTS   AGE
kube-system   dummy-2088944543-rnl2f             1/1       Running   0          1h
kube-system   etcd-k8node-1                      1/1       Running   0          1h
kube-system   kube-apiserver-k8node-1            1/1       Running   0          1h
kube-system   kube-controller-manager-k8node-1   1/1       Running   0          1h
kube-system   kube-discovery-1769846148-g8g85    1/1       Running   0          1h
kube-system   kube-dns-2924299975-7wwm6          4/4       Running   0          1h
kube-system   kube-proxy-9dxsb                   1/1       Running   0          46m
kube-system   kube-proxy-nx63x                   1/1       Running   0          1h
kube-system   kube-proxy-q0466                   1/1       Running   0          1h
kube-system   kube-scheduler-k8node-1            1/1       Running   0          1h
kube-system   weave-net-2nc8d                    2/2       Running   0          46m
kube-system   weave-net-2tphv                    2/2       Running   0          1h
kube-system   weave-net-mp6s0                    2/2       Running   0          1h


vagrant@k8node-1:~$ kubectl get nodes
NAME       STATUS         AGE
k8node-1   Ready,master   1h
k8node-2   Ready          1h
k8node-3   Ready          48m

这篇关于如何使用 kubeadm 和 Weave 让 kube-dns 在 Vagrant 集群中工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆