在写阻塞的套接字上使用 TCP Keep-Alive 获取断开连接通知 [英] Getting disconnection notification using TCP Keep-Alive on write blocked socket

查看:24
本文介绍了在写阻塞的套接字上使用 TCP Keep-Alive 获取断开连接通知的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 TCP Keep-Alive 选项来检测死连接.它适用于使用读取套接字的连接:

setsockopt(mysock,...)//设置各种保活选项epoll_ctl(ep,mysock,{EPOLLIN|EPOLERR|EPOLLHUP},)epoll_wait ->(当移除主机断开电缆时,几秒钟后退出)

Epoll 等待退出,EPOLLIN|EPOLLHUP 在套接字上没有问题.

但是,如果我尝试向套接字写入大量内容直到获得 EAGAIN 然后轮询读取和写入,则在断开电缆连接时不会出现错误:

setsockopt(mysock,...)//设置各种保活选项而(发送()!= EAGAIN);epoll_ctl(ep,mysock,{EPOLLIN|EPOLLOUT|EPOLERR|EPOLLHUP},)epoll_wait ->--- 永不退出!!!!即使移除主机的电缆断开连接!

  • 如何解决这个问题?
  • 有人遇到过类似的问题吗?
  • 任何可能的方向?

附加信息

当我监控与wireshark的通信时,在第一种情况下(阅读)我在几秒钟内收到一次确认请求.但在第二种情况下,我根本没有检测到.

解决方案

如果您在传输所有数据之前拔掉网络连接,则连接不是空闲的,因此在某些实现中,keepalive 计时器不会启动.(请记住,keepalive 不是 TCP 规范的一部分,因此它的实现不一致,如果有的话.)一般来说,由于指数退避和大量重试的组合(tcp_retries2默认为 15) 在 keepalive 计时器启动之前,传输重试可能需要长达 30 分钟才能超时.

解决方法(如果有)取决于您使用的特定 TCP 实现.一些较新版本的 Linux(2011 年 1 月 4 日发布的内核版本 2.6.37)实现了 TCP_USER_TIMEOUT.更多信息此处.

通常的建议是在应用程序级别实现通信超时,而不是使用基于 TCP 的 keepalive.例如,参见 HTTP Keep-Alive.>

I use TCP Keep-Alive option to detect dead connection. It works well with connection that use reading sockets:

setsockopt(mysock,...) // set various keep alive options

epoll_ctl(ep,mysock,{EPOLLIN|EPOLERR|EPOLLHUP},)
epoll_wait -> (exits after several seconds when remove host disconnects cable)

Epoll wait exits with EPOLLIN|EPOLLHUP on socket without a problem.

However if I try to write a lot to socket till I get EAGAIN and then poll for both reading and writing I don't get a error when I disconnect the cable:

setsockopt(mysock,...) // set various keep alive options

while(send() != EAGAIN)
   ;
epoll_ctl(ep,mysock,{EPOLLIN|EPOLLOUT|EPOLERR|EPOLLHUP},)
epoll_wait -> --- Never exits!!!! even when the cable of the remove host is disconnected!!!

  • How can this be solved?
  • Have anybody seen a similar problem?
  • Any possible direction?

Edit: Additional Information

When I monitor the communication with wireshark, in the first case (of reading) I get once in several seconds request for ack. But in the second case I don't detect ones at all.

解决方案

If you pull the network connection before all the data is transmitted, then the connection is not idle and thus in some implementations the keepalive timer does not start. (Keep in mind that keepalive is NOT part of the TCP specification and as a result it is implemented inconsistently if at all.) In general, because of the combination of exponential backoff and large number of retries (tcp_retries2 defaults to 15) it can take up to 30 minutes for transmission retries to time out before the keepalive timer starts.

The workaround, if there is one, depends on the particular TCP implementation you are using. Some newer versions of Linux (kernel version 2.6.37 released 4 January, 2011) implement TCP_USER_TIMEOUT. More info here.

The usual recommendation is to implement communication timeouts at the application level rather than use TCP-based keepalive anyway. See, for example, HTTP Keep-Alive.

这篇关于在写阻塞的套接字上使用 TCP Keep-Alive 获取断开连接通知的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆