使用tcp_tw_recycle删除连接 [英] Dropping of connections with tcp_tw_recycle

查看:318
本文介绍了使用tcp_tw_recycle删除连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个设置,其中有很多(每秒800到2400(到Linux机器)的传入连接,并且在客户端和服务器之间有一个NAT设备. 因此系统中剩余了许多TIME_WAIT套接字. 为了克服这个问题,我们将tcp_tw_recycle设置为1,但这导致连接断开. 在浏览完网络之后,我们确实找到了使用tcp_tw_recycle和NAT设备丢弃帧的原因的参考.

we are having a setup wherein a lot(800 to 2400 per second( of incoming connections to a linux box and we have a NAT device between the client and server. so there are so many TIME_WAIT sockets left in the system. To overcome that we had set tcp_tw_recycle to 1, but that led to drop of in comming connections. after browsing through the net we did find the references for why the dropping of frames with tcp_tw_recycle and NAT device happens.

然后,我们尝试通过将tcp_tw_reuse设置为1来正常工作,而在相同的设置和配置下没有任何问题.

we then tried by setting tcp_tw_reuse to 1 it worked fine without any issues with the same setup and configuration.

但是文档说,当经过TCP状态感知节点(例如防火墙,NAT设备或负载平衡器)的连接可能看到丢帧时,不应使用tcp_tw_recycle和tcp_tw_reuse.连接越多,您越有可能看到此问题.

But the documentation says that tcp_tw_recycle and tcp_tw_reuse should not be used when the Connections that go through TCP state aware nodes, such as firewalls, NAT devices or load balancers may see dropped frames. The more connections there are, the more likely you will see this issue.

1)可以在这种情况下使用tcp_tw_reuse吗? 2)如果不是,那么Linux代码的哪一部分阻止tcp_tw_reuse用于这种情况? 3)通常,tcp_tw_recycle和tcp_tw_reuse有什么区别?

1) can tcp_tw_reuse be used in this type of scenarios? 2) if not, which part of the linux code is preventing tcp_tw_reuse being used for such scenario? 3) generally what is the difference between tcp_tw_recycle and tcp_tw_reuse?

推荐答案

默认情况下,当同时禁用tcp_tw_reusetcp_tw_recycle时,内核将确保处于TIME_WAIT状态的套接字长时间保持该状态.足够长-足以确保属于将来连接的数据包不会被误认为是旧连接的最新数据包.

By default, when both tcp_tw_reuse and tcp_tw_recycle are disabled, the kernel will make sure that sockets in TIME_WAIT state will remain in that state long enough -- long enough to be sure that packets belonging to future connections will not be mistaken for late packets of the old connection.

启用tcp_tw_reuse后,处于TIME_WAIT状态的套接字可以在它们到期之前使用,并且内核将尝试确保TCP序列号没有冲突.如果启用tcp_timestamps(也称为PAWS,用于防止包装的序列号),则将确保不会发生这些冲突.但是,您需要在两端两端都启用TCP时间戳(至少,这是我的理解).请参阅 tcp_twsk_unique的定义以了解详细信息.

When you enable tcp_tw_reuse, sockets in TIME_WAIT state can be used before they expire, and the kernel will try to make sure that there is no collision regarding TCP sequence numbers. If you enable tcp_timestamps (a.k.a. PAWS, for Protection Against Wrapped Sequence Numbers), it will make sure that those collisions cannot happen. However, you need TCP timestamps to be enabled on both ends (at least, that's my understanding). See the definition of tcp_twsk_unique for the gory details.

启用tcp_tw_recycle后,内核将变得更具攻击性,并且将对远程主机使用的时间戳进行假设.它将跟踪每个处于TIME_WAIT状态的连接的远程主机使用的最后时间戳记,并在时间戳正确增加后允许重新使用套接字.但是,如果主机使用的时间戳更改(即,时间倒退),则SYN数据包将被静默丢弃,并且连接将无法建立(您会看到类似于连接超时"的错误).如果您想深入研究内核代码,则定义tcp_timewait_state_process 可能很好起点.

When you enable tcp_tw_recycle, the kernel becomes much more aggressive, and will make assumptions on the timestamps used by remote hosts. It will track the last timestamp used by each remote host having a connection in TIME_WAIT state), and allow to re-use a socket if the timestamp has correctly increased. However, if the timestamp used by the host changes (i.e. warps back in time), the SYN packet will be silently dropped, and the connection won't establish (you will see an error similar to "connect timeout"). If you want to dive into kernel code, the definition of tcp_timewait_state_process might be a good starting point.

现在,时间戳永远都不能回到过去.除非:

Now, timestamps should never go back in time; unless:

  • 主机已重新启动(但随后,当它重新启动时,TIME_WAIT套接字可能已过期,因此这不是问题);
  • 该IP地址可快速用于其他用途(TIME_WAIT连接将保留一些时间,但其他连接可能会被TCP RST断开,这将释放一些空间);
  • 连接的中间涉及
  • 网络地址转换(或智能裤防火墙).
  • the host is rebooted (but then, by the time it comes back up, TIME_WAIT socket will probably have expired, so it will be a non issue);
  • the IP address is quickly reused by something else (TIME_WAIT connections will stay a bit, but other connections will probably be struck by TCP RST and that will free up some space);
  • network address translation (or a smarty-pants firewall) is involved in the middle of the connection.

在后一种情况下,您可以在同一IP地址后面有多个主机,因此可以有不同的时间戳序列(或者说,在每个连接处,防火墙都会随机分配这些时间戳).在这种情况下,某些主机将随机无法连接,因为它们已映射到服务器的TIME_WAIT存储桶具有较新时间戳记的端口.因此,文档告诉您由于设置,NAT设备或负载平衡器可能会开始丢帧".

In the latter case, you can have multiple hosts behind the same IP address, and therefore, different sequences of timestamps (or, said timestamps are randomized at each connection by the firewall). In that case, some hosts will be randomly unable to connect, because they are mapped to a port for which the TIME_WAIT bucket of the server has a newer timestamp. That's why the docs tell you that "NAT devices or load balancers may start drop frames because of the setting".

有些人建议不使用tcp_tw_recycle,而启用tcp_tw_reuse并降低tcp_fin_timeout.我同意:-)

Some people recommend to leave tcp_tw_recycle alone, but enable tcp_tw_reuse and lower tcp_fin_timeout. I concur :-)

这篇关于使用tcp_tw_recycle删除连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆