如果网络出现故障,socket 会发生什么 [英] What happened to socket if network has broken down

查看:26
本文介绍了如果网络出现故障,socket 会发生什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设一个简单的网络模型:A 已经成功地创建了一个到 B 的 TCP 连接,并且他们正在像这样相互通信

Suppose a simple network model: A has successfully created a TCP connection to B, and they are communicating with each other like this

A <---------->B

我知道如果 A 上的程序死掉(例如核心转储),这将导致一个 RST 数据包到 B.所以任何 B 的读取尝试都会导致 EOF,而 B 的任何写入尝试都会导致 SIGPIPE.我说的对吗?

I know that if the program on A dies (such as core dump), that will cause a RST packet to B. So any read attempt of B will lead to an EOF, and any write attempt of B will lead to SIGPIPE. Am I right?

但是,如果假设 A 上的网络出现故障(例如电缆/路由器故障),那么 B 的读/写尝试会发生什么情况?在我的情况下,所有套接字都设置为非阻塞.这样一来,我是不是无法检测到网络错误?

If, however, suppose the network has broken down (such as cable/router failure) on A, what happens to the read/write attempt of B? In my situation, all the sockets has been set to non-blocking. As a result, is it impossible for me to detect network error?

顺便说一句,我注意到套接字中有一个选项 SO_KEEPALIVE 可能对我有用 http://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/.但我想知道如果我将探测间隔设置为 2~3 秒(默认为 75 秒),成本会是多少?而且似乎间隔配置是全局配置,那么这会影响机器上的所有套接字吗?

By the way, I notice that there is an option SO_KEEPALIVE in socket which may be useful to me http://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/. But I wonder how much the cost will be if I set the probing interval to 2~3 second (which by default is 75 seoncd)? And it seems interval configuration is a global one, so is this gonna affect all the sockets on the machine?

最后一个问题...假设网络出现故障,任何写入尝试都会在一段时间后导致 EPIPE.但是,如果我不尝试写入,而是将此套接字放入 epoll 设备,那么会发生什么?epoll_wait 会返回 EPOLLHUP 或 EPOLLERR 事件吗?

Final question... Say the network has broken down and any write attempt would cause EPIPE some time later. If, however, instead of trying to write, I put this socket into epoll device, what will happend then? Will epoll_wait return EPOLLHUP or EPOLLERR event?

推荐答案

还有很多其他的方法可以让 TCP 连接不被检测到

There's numerous other ways a TCP connection can go dead undetected

  • 有人拔出中间的网线.
  • 另一端的计算机被核爆了.
  • 中间的 nat 网关静默断开连接
  • 另一端的操作系统严重崩溃.
  • FIN 数据包丢失.
  • 无法检测到的错误:端点之间的路由器可能会丢弃数据包.(包括控制数据包)reff

在所有情况下,当您尝试在套接字上写入时,您都可以通过程序中的 SIGPIPE 错误了解它并终止它.

In all cases you can know about it when you try to write on socket this cause through SIGPIPE error in your program and terminate it.

通过 read() 无法知道对方是否存活.为什么 SO_KEEPALIVE 有用.Keepalive 是非侵入性的,在大多数情况下,如果您有疑问,可以将其打开,而不会有做错事的风险.但请记住,它会产生额外的网络流量,这可能会对路由器和防火墙产生影响.

By read() it can't be know whether other-side live or not. Thants Why SO_KEEPALIVE useful. Keepalive is non-invasive, and in most cases, if you're in doubt, you can turn it on without the risk of doing something wrong. But do remember that it generates extra network traffic, which can have an impact on routers and firewalls.

这也会影响您机器上的所有套接字!(您是对的).并且因为 SO_KEEPALIVE 增加了流量并消耗了 CPU.最好设置 SIGPIPE 句柄,如果应用程序有可能写入断开的连接.

And this affects all sockets on your machine too!(you are correct). And Because SO_KEEPALIVE increase traffic and consume CPU. It's best to set the SIGPIPE handle, if there is a chance application will ever write to a broken connection.

还在应用程序的合理位置使用 SO_KEEPALIVE.在整个连接期间使用它很糟糕(即当服务器在客户端查询上长时间工作时使用 so_keepalive).

Also use SO_KEEPALIVE at reasonable place in the application. It's poor to use it for whole connection duration (i.e do use so_keepalive when server works for long on client query).

设置探测间隔取决于您的应用程序或说应用层协议.

Setting the probing interval Dependends on your application or say Application layer protocol.

虽然启用了 TCP keepalive,但您最终会检测到它 - 至少在几个小时内.

Though enabling TCP keepalive, you'll detect it eventually - at least during a couple of hours.

如果网络出现故障,但不是尝试写入,而是将套接字放入某个 epoll 设备中:

epoll中的第二个参数:

The second argument in epoll:

 n = epoll_wait (efd, events, MAXEVENTS, -1);

使用正确的事件相关代码设置,好的做法是检查此代码
注意事项如下.

Set with correct event-related code, Good practice is to check this code for
caution as follow.

n = epoll_wait (efd, events, MAXEVENTS, -1);  
for (i = 0; i < n; i++)  
{   
    if ((events[i].events & EPOLLERR) ||
          (events[i].events & EPOLLHUP) ||
          (!(events[i].events & EPOLLIN)))
    {
          /* An error has occured on this fd, or the socket is not
             ready for reading (why were we notified then?) */
      fprintf (stderr, "epoll error
");
      close (events[i].data.fd);
      continue;
    }

    else if (sfd == events[i].data.fd)
    {
          /* We have a notification on the listening socket, which
         means one or more incoming connections. */
         
         // Do what you wants
     }
}

EPOLLRDHUP 的意思是:
Stream socket peer 关闭连接,或者关闭写入一半的连接.(此标志对于编写简单代码以在使用边缘触发监视时检测对等关闭特别有用.)

Where EPOLLRDHUP means is:
Stream socket peer closed connection, or shut down writing half of connection. (This flag is especially useful for writing simple code to detect peer shutdown when using Edge Triggered monitoring.)

这篇关于如果网络出现故障,socket 会发生什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆