在写阻止的套接字上使用TCP Keep-Alive获取断开连接通知 [英] Getting disconnection notification using TCP Keep-Alive on write blocked socket

查看:93
本文介绍了在写阻止的套接字上使用TCP Keep-Alive获取断开连接通知的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用TCP Keep-Alive选项检测死连接.它与使用读取套接字的连接一起很好地工作:

setsockopt(mysock,...) // set various keep alive options

epoll_ctl(ep,mysock,{EPOLLIN|EPOLERR|EPOLLHUP},)
epoll_wait -> (exits after several seconds when remove host disconnects cable)

Epoll等待以套接字上的EPOLLIN | EPOLLHUP退出而没有问题.

但是,如果我尝试向套接字写入很多内容,直到获得EAGAIN信息,然后轮询读写,则在断开电缆连接时不会出现错误:

setsockopt(mysock,...) // set various keep alive options

while(send() != EAGAIN)
   ;
epoll_ctl(ep,mysock,{EPOLLIN|EPOLLOUT|EPOLERR|EPOLLHUP},)
epoll_wait -> --- Never exits!!!! even when the cable of the remove host is disconnected!!!

  • 如何解决?
  • 有人看到过类似的问题吗?
  • 任何可能的方向吗?

修改:其他信息

当我监视wireshark的通信时,在第一种情况下(阅读中),我每隔几秒钟收到一次确认请求.但是在第二种情况下,我根本没有检测到这些.

解决方案

如果在传输所有数据之前拔出网络连接,则该连接不是空闲的,因此在某些实现中,keepalive计时器不会启动. (请记住,keepalive不是TCP规范的一部分,因此,如果完全不一致,则实现不一致.)通常,由于指数补偿和大量重试(tcp_retries2默认为15)的组合,因此它在Keepalive计时器启动之前,最多可能需要30分钟才能使传输重试超时.

解决方法(如果有)取决于您使用的特定TCP实现.一些较新版本的Linux(2011年1月4日发布的内核版本2.6.37)实现了TCP_USER_TIMEOUT.更多信息此处.

通常的建议是在应用程序级别实现通信超时,而不是始终使用基于TCP的keepalive.例如,请参见 HTTP保持活动.

I use TCP Keep-Alive option to detect dead connection. It works well with connection that use reading sockets:

setsockopt(mysock,...) // set various keep alive options

epoll_ctl(ep,mysock,{EPOLLIN|EPOLERR|EPOLLHUP},)
epoll_wait -> (exits after several seconds when remove host disconnects cable)

Epoll wait exits with EPOLLIN|EPOLLHUP on socket without a problem.

However if I try to write a lot to socket till I get EAGAIN and then poll for both reading and writing I don't get a error when I disconnect the cable:

setsockopt(mysock,...) // set various keep alive options

while(send() != EAGAIN)
   ;
epoll_ctl(ep,mysock,{EPOLLIN|EPOLLOUT|EPOLERR|EPOLLHUP},)
epoll_wait -> --- Never exits!!!! even when the cable of the remove host is disconnected!!!

  • How can this be solved?
  • Have anybody seen a similar problem?
  • Any possible direction?

Edit: Additional Information

When I monitor the communication with wireshark, in the first case (of reading) I get once in several seconds request for ack. But in the second case I don't detect ones at all.

解决方案

If you pull the network connection before all the data is transmitted, then the connection is not idle and thus in some implementations the keepalive timer does not start. (Keep in mind that keepalive is NOT part of the TCP specification and as a result it is implemented inconsistently if at all.) In general, because of the combination of exponential backoff and large number of retries (tcp_retries2 defaults to 15) it can take up to 30 minutes for transmission retries to time out before the keepalive timer starts.

The workaround, if there is one, depends on the particular TCP implementation you are using. Some newer versions of Linux (kernel version 2.6.37 released 4 January, 2011) implement TCP_USER_TIMEOUT. More info here.

The usual recommendation is to implement communication timeouts at the application level rather than use TCP-based keepalive anyway. See, for example, HTTP Keep-Alive.

这篇关于在写阻止的套接字上使用TCP Keep-Alive获取断开连接通知的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆