Windows TCP 套接字默认启用 SO_KEEPALIVE 吗? [英] Windows TCP socket has SO_KEEPALIVE enabled by default?

查看:24
本文介绍了Windows TCP 套接字默认启用 SO_KEEPALIVE 吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了一个关于 TCP 套接字的奇怪错误.似乎在所有套接字上默认启用 SO_KEEPALIVE.

I've encountered a strange bug with TCP sockets. It seems that SO_KEEPALIVE is enabled on all sockets by default.

我写了一个简短的测试用例来创建一个套接字并连接到服务器.连接后,我立即使用 getsockopt 检查 SO_KEEPALIVE.该值不为零,根据 MSDN,这意味着启用了保持活动.也许我误解了这一点.

I wrote a short test case to create a socket and connect to a server. Immediately after the connect, I check SO_KEEPALIVE with getsockopt. The value is non-zero, which according to the MSDN, means keep alive is enabled. Maybe I'm misunderstanding this.

我最近遇到了一个奇怪的错误,即服务器连续两次断开连接.某些客户端处于已发送登录信息并正在等待响应的状态.尽管有一个重叠的 WSARecv 发布到连接到服务器的套接字,但没有发布完成通知客户端服务器崩溃,所以我假设套接字没有完全关闭.

I recently had a strange bug where a server disconnected twice in a row. Some clients were in a state where they had sent logon information and were waiting for a response. Even though there was an overlapped WSARecv posted to the socket connected to the server, no completion was posted to notify the client that the server crashed, so I'm assuming the socket wasn't fully closed.

大约 2 小时后(实际上大约是 1 小时 59 分 19 秒),发布了一个完成数据包进行读取,通知客户端连接不再打开.这就是我开始怀疑 SO_KEEPALIVE 的地方.

Roughly 2 hours later (actually about 1 hour, 59 minutes, and 19 seconds), a completion packet was posted for the read, notifying the client that the connection is no longer open. This is where I started to suspect SO_KEEPALIVE.

我试图理解为什么会发生这种情况.这引起了一些问题,因为由于任何原因失去连接的客户端都应该自动重新连接到服务器;在这种情况下,由于没有通知断开连接,客户端直到 2 小时后才重新连接.

I'm trying to understand why this happened. It caused a bit of an issue because clients who lose their connection for any reason are supposed to automatically reconnect to the server; in this case, because no disconnect was notified, the client didn't reconnect until 2 hours later.

一个明显的解决方法是设置超时,但我想知道这种情况是如何发生的.

An obvious fix is to put a timeout, but I'd like to know how this situation could occur.

SO_KEEPALIVE 未由我的应用程序服务器或客户端在套接字上设置.

SO_KEEPALIVE is not set on the socket by my application server or client.

// Error checking is removed for this snippet, but all winsock calls succeed.
int main() {
    WORD wVersionRequested;
    WSADATA wsaData;
    int err;

    wVersionRequested = MAKEWORD(2, 2);
    err = WSAStartup(wVersionRequested, &wsaData);

    SOCKET foo = WSASocket(AF_INET, SOCK_STREAM, IPPROTO_TCP, 0, 0, 0);

    DWORD optval;
    int optlen = sizeof(optval);
    int test = 0;
    test = getsockopt(foo, SOL_SOCKET, SO_KEEPALIVE, (char*)&optval, &optlen);
    std::cout << "Returned " << optval << std::endl;

    sockaddr_in clientService; 
    clientService.sin_family = AF_INET;
    clientService.sin_addr.s_addr = inet_addr("127.0.0.1");
    clientService.sin_port = htons(446);

    connect(foo, (SOCKADDR*) &clientService, sizeof(clientService));

    test = getsockopt(foo, SOL_SOCKET, SO_KEEPALIVE, (char*)&optval, &optlen);
    std::cout << "Returned " << optval << std::endl;

    std::cin.get();
    return 0;
}

// Example output:
// Returned 2883584
// Returned 2883584

推荐答案

首先在 VM 上全新安装的操作系统上运行测试.我怀疑您安装的其他东西可能影响了保持活动设置.

Firstly run your test on a clean installation of the operating system on a VM. I suspect that something else you have installed has fiddled with the keep alive setting, perhaps.

其次,我怀疑启用 keep alive 是导致问题的原因.如果未启用保持活动状态,那么您将永远不会从该挂起读取中收到连接关闭通知.TCP 应该是这样工作的,它允许中间路由器离开和回来,你既不知道也不关心.唯一会通知您失败的时间是您尝试发送但连接中断(或者,在这种情况下,如果您尝试发送但服务器已退回).启用保持连接的事实意味着在 1 小时 59 分钟标记时 TCP 堆栈传输了保持连接并注意到连接已关闭.如果未启用保持活动,那么您将不得不等到您传输某些内容.

Secondly, I doubt that keep alive being enabled is the cause of your problem. If keep alive wasn't enabled then you would never have got a connection closure notification from that pending read. TCP is supposed to work like that, it allows for intermediate routers to go away and come back and you to neither know nor care. The only time you will be informed of the failure is if you try and send and the connection is broken (or, in this case, if you try and send and the server has bounced). The fact that keep alive was enabled means that at that 1hr 59mins mark the TCP stack transmitted the keep alive and noticed that the connection was down. If keep alive wasn't enabled then you would have had to wait until YOU transmitted something.

如果您的客户需要知道连接是否断开,那么最好完全忽略保持活动状态(如您所见,即使您不是启用它的人,它也会影响整个机器,而对我而言,它会影响到它一个糟糕的解决方案).如果可以,请在您的协议中添加应用程序级别的 ping 和/或超时.因此,也许每个命令都希望在 30 秒内得到响应,而您每分钟都会从服务器发送一个响应……然后您会尽快发现死连接,然后您可以断开连接并在那时重新连接.

If your clients need to know if the connection goes down then it's better to ignore keep alive completely (as you can see, it affects the whole machine even when you're not the person that enabled it and to me that makes it a poor solution). If you can, add an application level ping and/or timeout to your protocol. So, perhaps, every command expects a response within 30secs and you send a from the server every minute... You'll then find out about dead connection as quickly as you like and you can disconnect and reconnect at that point.

我在我的服务器框架中很好地使用了它;事实上,我有一个标准的 'async read timeout' 连接过滤器'连接重新建立'过滤器,这使得确保连接始终存在.读取超时所做的只是中止现有连接,并且连接重新建立代码会启动以重新创建连接,就像连接因任何其他原因关闭一样.

I've used this pretty well with my server framework; in fact I have a standard 'async read timeout' connection filter and a 'connection re-establishment' filter which make it trivial to ensure that the connections are always live. All the read timeout does is abort the existing connection and the connection re-establishment code kicks in to recreate the connection just as it would if the connection had been closed for any other reason.

这篇关于Windows TCP 套接字默认启用 SO_KEEPALIVE 吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆