k8s:使用iptables从公共VIP转发到clusterIP [英] k8s: forwarding from public VIP to clusterIP with iptables
问题描述
我试图深入了解从公开的负载均衡器的第2层VIP到服务的群集IP的转发方式.我已经阅读了一个高级概述,其中 MetalLB做到了,并且我尝试了通过设置keepalived/ucarp VIP和iptables规则手动复制它.但是,我一定想念一些东西,因为它不起作用;-]
I'm trying to understand in depth how forwarding from publicly exposed load-balancer's layer-2 VIPs to services' cluster-IPs works. I've read a high-level overview how MetalLB does it and I've tried to replicate it manually by setting keepalived/ucarp VIP and iptables rules. I must be missing something however as it doesn't work ;-]
我采取的步骤:
-
创建了一个包含
kubeadm
的群集,该群集由一个主节点+ 3个节点组成,该节点在单台计算机上的libvirt/KVM VM上运行k8s-1.17.2 + calico-3.12.所有VM均位于192.168.122.0/24
虚拟网络中.
created a cluster with
kubeadm
consisting of a master + 3 nodes running k8s-1.17.2 + calico-3.12 on libvirt/KVM VMs on a single computer. all VMs are in192.168.122.0/24
virtual network.
创建了一个简单的2 Pod部署,并将其作为NodePort
服务公开,且externalTrafficPolicy
设置为cluster
:
$ kubectl get svc dump-request
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dump-request NodePort 10.100.234.120 <none> 80:32292/TCP 65s
我已经验证可以从主机上的每个节点IP的32292端口访问它.
created a simple 2 pod deployment and exposed it as a NodePort
service with externalTrafficPolicy
set to cluster
:
$ kubectl get svc dump-request
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dump-request NodePort 10.100.234.120 <none> 80:32292/TCP 65s
I've verified that I can reach it from the host machine on every node's IP at 32292 port.
在所有3个节点上使用ucarp
创建了VIP:
ucarp -i ens3 -s 192.168.122.21 -k 21 -a 192.168.122.71 -v 71 -x 71 -p dump -z -n -u /usr/local/sbin/vip-up.sh -d /usr/local/sbin/vip-down.sh
(来自knode1的示例)
我已验证可以ping 192.168.122.71
VIP.我什至可以将其切换到当前持有VIP的VM.
现在,如果kube-proxy处于iptables
模式,我还可以通过http://192.168.122.71:32292
处的VIP在其节点端口上访问该服务.但是,令我惊讶的是,在ipvs
模式下,这总是导致连接超时.
created a VIP with ucarp
on all 3 nodes:
ucarp -i ens3 -s 192.168.122.21 -k 21 -a 192.168.122.71 -v 71 -x 71 -p dump -z -n -u /usr/local/sbin/vip-up.sh -d /usr/local/sbin/vip-down.sh
(example from knode1)
I've verified that I can ping the 192.168.122.71
VIP. I even could ssh through it to the VM that was currently holding the VIP.
Now if kube-proxy was in iptables
mode, I could also reach the service on its node-port through the VIP at http://192.168.122.71:32292
. However, to my surprise, in ipvs
mode this always resulted in connection timing out.
在每个节点上添加了一个iptables规则,用于将传入192.168.122.71
的数据包转发到服务的群集IP 10.100.234.120
:
iptables -t nat -A PREROUTING -d 192.168.122.71 -j DNAT --to-destination 10.100.234.120
(后来,我也尝试将规则范围缩小到相关端口,但是它并没有以任何方式改变结果:
iptables -t nat -A PREROUTING -d 192.168.122.71 -p tcp --dport 80 -j DNAT --to-destination 10.100.234.120:80
)
added an iptables rule on every node for packets incoming to 192.168.122.71
to be forwarded to to service's cluster-IP 10.100.234.120
:
iptables -t nat -A PREROUTING -d 192.168.122.71 -j DNAT --to-destination 10.100.234.120
(later I've also tried to narrow the rule only to the relevant port, but it didn't change the results in any way:
iptables -t nat -A PREROUTING -d 192.168.122.71 -p tcp --dport 80 -j DNAT --to-destination 10.100.234.120:80
)
结果:
在iptables
模式下,所有对http://192.168.122.71:80/
的请求都导致连接超时.
in iptables
mode all requests to http://192.168.122.71:80/
resulted in connection timing out.
在ipvs
模式下部分起作用:
如果192.168.122.71
VIP由一个装有Pod的节点持有,则大约有50%的请求成功,并且本地Pod始终为它们提供服务.该应用程序还获取了主机的真实远程IP(192.168.122.1
).其他50%(大概是发送到另一个节点上的Pod)正在超时.
如果VIP是由没有Pod的节点持有的,则所有请求都将超时.
in ipvs
mode it worked partially:
if the 192.168.122.71
VIP was being held by a node that had a pod on it, then about 50% requests were succeeding and they were always served by the local pod. the app was also getting the real remote IP of the host machine (192.168.122.1
). the other 50% (being sent to the pod on anther node presumably) were timing out.
if the VIP was being held by a node without pods then all requests were timing out.
我还检查了它是否仍然会影响结果,以便始终将规则保留在所有节点上,而仅将规则保留在持有VIP的节点上并在VIP发布时将其删除:结果是两种情况都一样.
I've also checked if it affects the results in anyway to keep the rule on all nodes at all times vs. to keep it only on the node holding the VIP and deleting it at the release of the VIP: results were the same in both cases.
有人知道为什么它不起作用以及如何解决吗?我将感谢您的帮助:)
Does anyone know why it doesn't work and how to fix it? I will appreciate help with this :)
推荐答案
还需要添加MASQUERADE
规则,以便相应地更改源.例如:
iptables -t nat -A POSTROUTING -j MASQUERADE
need to add MASQUERADE
rule also, so that the source is changed accordingly. for example:
iptables -t nat -A POSTROUTING -j MASQUERADE
经ipvs
这篇关于k8s:使用iptables从公共VIP转发到clusterIP的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!