套接字 recv() 挂在带有 MSG_WAITALL 的大消息上 [英] Socket recv() hang on large message with MSG_WAITALL

查看:24
本文介绍了套接字 recv() 挂在带有 MSG_WAITALL 的大消息上的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个应用程序可以从服务器读取大文件并经常挂在特定机器上.它在RHEL5.2下已经成功运行了很长时间.我们最近升级到 RHEL6.1,现在它经常挂起.

I have an application that reads large files from a server and hangs frequently on a particular machine. It has worked successfully under RHEL5.2 for a long time. We have recently upgraded to RHEL6.1 and it now hangs regularly.

我创建了一个可重现该问题的测试应用.它挂起大约 100 次中的 98 次.

I have created a test app that reproduces the problem. It hangs approx 98 times out of 100.

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/param.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <netdb.h>
#include <sys/socket.h>
#include <sys/time.h>

int mFD = 0;

void open_socket()
{
  struct addrinfo hints, *res;
  memset(&hints, 0, sizeof(hints));
  hints.ai_socktype = SOCK_STREAM;
  hints.ai_family = AF_INET;

  if (getaddrinfo("localhost", "60000", &hints, &res) != 0)
  {
    fprintf(stderr, "Exit %d
", __LINE__);
    exit(1);
  }

  mFD = socket(res->ai_family, res->ai_socktype, res->ai_protocol);

  if (mFD == -1)
  {
    fprintf(stderr, "Exit %d
", __LINE__);
    exit(1);
  }

  if (connect(mFD, res->ai_addr, res->ai_addrlen) < 0)
  {
    fprintf(stderr, "Exit %d
", __LINE__);
    exit(1);
  }

  freeaddrinfo(res);
}

void read_message(int size, void* data)
{
  int bytesLeft = size;
  int numRd = 0;

  while (bytesLeft != 0)
  {
    fprintf(stderr, "reading %d bytes
", bytesLeft);

    /* Replacing MSG_WAITALL with 0 works fine */
    int num = recv(mFD, data, bytesLeft, MSG_WAITALL);

    if (num == 0)
    {
      break;
    }
    else if (num < 0 && errno != EINTR)
    {
      fprintf(stderr, "Exit %d
", __LINE__);
      exit(1);
    }
    else if (num > 0)
    {
      numRd += num;
      data += num;
      bytesLeft -= num;
      fprintf(stderr, "read %d bytes - remaining = %d
", num, bytesLeft);
    }
  }

  fprintf(stderr, "read total of %d bytes
", numRd);
}

int main(int argc, char **argv)
{
  open_socket();

  uint32_t raw_len = atoi(argv[1]);
  char raw[raw_len];

  read_message(raw_len, raw);

  return 0;
}

我的一些测试笔记:

  • 如果localhost"映射到环回地址 127.0.0.1,应用程序会挂起对 recv() 的调用并且永远不会返回.
  • 如果localhost"映射到机器的 IP,从而通过以太网接口路由数据包,则应用程序成功完成.
  • 当我遇到挂起时,服务器发送TCP Window Full"消息,客户端以TCP ZeroWindow"消息响应(参见图像和附加的 tcpdump 捕获).从这一点开始,它永远挂起,服务器发送保持活动状态,客户端发送 ZeroWindow 消息.客户端似乎从不展开其窗口,从而完成传输.
  • 在挂起期间,如果我检查netstat -a"的输出,服务器发送队列中有数据,但客户端接收队列为空.
  • 如果我从 recv() 调用中删除 MSG_WAITALL 标志,应用程序将成功完成.
  • 挂起问题仅在使用 1 台特定机器上的环回接口时出现.我怀疑这可能都与时序依赖性有关.
  • 随着我减小文件"的大小,发生挂起的可能性降低了

可以在此处找到测试应用的源代码:

The source for the test app can be found here:

套接字测试源

可以在此处找到来自环回接口的 tcpdump 捕获:

The tcpdump capture from the loopback interface can be found here:

tcpdump 捕获

我通过发出以下命令重现该问题:

I reproduce the issue by issuing the following commands:

>  gcc socket_test.c -o socket_test
>  perl -e 'for (1..6000000){ print "a" }' | nc -l 60000
>  ./socket_test 6000000

这会看到 6000000 字节发送到测试应用程序,该应用程序尝试使用对 recv() 的单个调用读取数据.

This sees 6000000 bytes sent to the test app which tries to read the data using a single call to recv().

我很想听听关于我可能做错什么的任何建议或任何进一步调试问题的方法.

I would love to hear any suggestions on what I might be doing wrong or any further ways to debug the issue.

推荐答案

MSG_WAITALL 应该阻塞直到接收到所有数据.来自 recv 手册页:

MSG_WAITALL should block until all data has been received. From the manual page on recv:

该标志要求操作阻塞,直到满足完整请求.

This flag requests that the operation block until the full request is satisfied.

然而,网络堆栈中的缓冲区可能不足以包含所有内容,这就是服务器上出现错误消息的原因.客户端网络堆栈根本无法容纳那么多数据.

However, the buffers in the network stack probably are not large enough to contain everything, which is the reason for the error messages on the server. The client network stack simply can't hold that much data.

解决方案要么增加缓冲区大小(SO_RCVBUF setsockopt 的选项),将消息拆分成更小的部分,或者接收更小的块并将其放入您自己的缓冲区.最后一个是我推荐的.

The solution is either to increase the buffer sizes (SO_RCVBUF option to setsockopt), split the message into smaller pieces, or receiving smaller chunks putting it into your own buffer. The last is what I would recommend.

我在你的代码中看到你已经按照我的建议做了(使用自己的缓冲读取较小的块),所以只需删除 MSG_WAITALL 标志,它应该可以工作.

I see in your code that you already do what I suggested (read smaller chunks with own buffering,) so just remove the MSG_WAITALL flag and it should work.

哦,当 recv 返回零时,那意味着另一端已经关闭了连接,你也应该这样做.

Oh, and when recv returns zero, that means the other end have closed the connection, and that you should do it too.

这篇关于套接字 recv() 挂在带有 MSG_WAITALL 的大消息上的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆