如果网络发生故障,套接字会发生什么情况 [英] What happened to socket if network has broken down

查看:194
本文介绍了如果网络发生故障,套接字会发生什么情况的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设一个简单的网络模型:A已成功建立了与B的TCP连接,并且它们之间正在像这样相互通信

Suppose a simple network model: A has successfully created a TCP connection to B, and they are communicating with each other like this

A <----------> B

我知道,如果A上的程序死了(例如核心转储),这将导致向B发出RST数据包.因此,对B的任何读取尝试都将导致EOF,而对B的任何写入尝试均将导致SIGPIPE .我说的对吗?

I know that if the program on A dies (such as core dump), that will cause a RST packet to B. So any read attempt of B will lead to an EOF, and any write attempt of B will lead to SIGPIPE. Am I right?

但是,如果假设网络在A上发生故障(例如电缆/路由器故障),那么对B的读/写尝试将如何处理?在我的情况下,所有套接字都已设置为非阻塞.结果,我无法检测到网络错误吗?

If, however, suppose the network has broken down (such as cable/router failure) on A, what happens to the read/write attempt of B? In my situation, all the sockets has been set to non-blocking. As a result, is it impossible for me to detect network error?

顺便说一句,我注意到套接字中有一个选项SO_KEEPALIVE,这可能对我很有用 http://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/.但是我想知道如果将探测间隔设置为2〜3秒(默认情况下为75秒)会花费多少钱?而且似乎间隔配置是全局配置,那么这会影响机器上的所有套接字吗?

By the way, I notice that there is an option SO_KEEPALIVE in socket which may be useful to me http://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/. But I wonder how much the cost will be if I set the probing interval to 2~3 second (which by default is 75 seoncd)? And it seems interval configuration is a global one, so is this gonna affect all the sockets on the machine?

最后一个问题... 假设网络发生故障,并且任何写入尝试都会在一段时间后导致EPIPE.但是,如果我不尝试写,而是将此套接字放入epoll设备,那么会发生什么呢? epoll_wait是否会返回EPOLLHUP或EPOLLERR事件?

Final question... Say the network has broken down and any write attempt would cause EPIPE some time later. If, however, instead of trying to write, I put this socket into epoll device, what will happend then? Will epoll_wait return EPOLLHUP or EPOLLERR event?

推荐答案

TCP连接还有许多其他方式可以导致无法检测到死亡

There's numerous other ways a TCP connection can go dead undetected

  • 有人拉出它们之间的网络电缆.
  • 另一端的计算机很糟糕.
  • 介于两者之间的nat网关会静默断开连接
  • 另一端的操作系统严重崩溃.
  • FIN数据包丢失.
  • 无法检测到的错误:端点之间的路由器可能会丢弃数据包(包括控制数据包). 引用
  • someone yanks out a network cable inbetween.
  • the computer at the other end gets nuked.
  • a nat gateway inbetween silently drops the connection
  • the OS at the other end crashes hard.
  • the FIN packets gets lost.
  • undetectable errors: A router in-between the endpoints may drops packets.(including control packets) reff

在所有情况下,当您尝试在套接字上进行写入时,您都可以了解它,这是由于程序中的SIGPIPE错误并终止该错误.

In all cases you can know about it when you try to write on socket this cause through SIGPIPE error in your program and terminate it.

通过read()不能知道对方是否在现场. Thants为什么SO_KEEPALIVE有用. Keepalive是非侵入性的,在大多数情况下,如果您有疑问,可以将其打开,而不会做错任何事情.但是请记住,它会产生额外的网络流量,这可能会对路由器和防火墙产生影响.

By read() it can't be know whether other-side live or not. Thants Why SO_KEEPALIVE useful. Keepalive is non-invasive, and in most cases, if you're in doubt, you can turn it on without the risk of doing something wrong. But do remember that it generates extra network traffic, which can have an impact on routers and firewalls.

这也会影响您计算机上的所有插槽!(您是正确的).并且因为SO_KEEPALIVE增加了流量并消耗了CPU.如果应用程序有机会写入断开的连接,则最好设置SIGPIPE句柄.

And this affects all sockets on your machine too!(you are correct). And Because SO_KEEPALIVE increase traffic and consume CPU. It's best to set the SIGPIPE handle, if there is a chance application will ever write to a broken connection.

还要在应用程序中的合理位置使用SO_KEEPALIVE.在整个连接期间都不能使用它(即,当服务器在客户端查询上长时间工作时,请使用so_keepalive).

Also use SO_KEEPALIVE at reasonable place in the application. It's poor to use it for whole connection duration (i.e do use so_keepalive when server works for long on client query).

设置探测间隔取决于您的应用程序或说 应用层协议.

Setting the probing interval Dependends on your application or say Application layer protocol.

尽管启用了TCP keepalive,但最终还是会发现它-至少在几个小时之内.

Though enabling TCP keepalive, you'll detect it eventually - at least during a couple of hours.

如果网络发生故障,请说,套接字不是插入,而是插入到一些epoll设备中:

epoll中的第二个参数:

The second argument in epoll:

 n = epoll_wait (efd, events, MAXEVENTS, -1);

设置正确的事件相关代码,好的做法是检查此代码是否为
请注意以下几点.

Set with correct event-related code, Good practice is to check this code for
caution as follow.

n = epoll_wait (efd, events, MAXEVENTS, -1);  
for (i = 0; i < n; i++)  
{   
    if ((events[i].events & EPOLLERR) ||
          (events[i].events & EPOLLHUP) ||
          (!(events[i].events & EPOLLIN)))
    {
          /* An error has occured on this fd, or the socket is not
             ready for reading (why were we notified then?) */
      fprintf (stderr, "epoll error\n");
      close (events[i].data.fd);
      continue;
    }

    else if (sfd == events[i].data.fd)
    {
          /* We have a notification on the listening socket, which
         means one or more incoming connections. */

         // Do what you wants
     }
}

EPOLLRDHUP 的意思是:
流套接字对等体关闭连接,或关闭写入一半连接. (此标志对于编写简单的代码以使用边缘触发的监视"来检测对等设备关闭特别有用.)

Where EPOLLRDHUP means is:
Stream socket peer closed connection, or shut down writing half of connection. (This flag is especially useful for writing simple code to detect peer shutdown when using Edge Triggered monitoring.)

这篇关于如果网络发生故障,套接字会发生什么情况的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆