与MSG_WAITALL大消息的Socket的recv()挂 [英] Socket recv() hang on large message with MSG_WAITALL

查看:2339
本文介绍了与MSG_WAITALL大消息的Socket的recv()挂的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个从服务器读取大型文件,并经常挂在特定机器上的应用程序。它RHEL5.2下成功地工作了很长一段时间。我们最近升级到RHEL6.1,现在经常挂。

I have an application that reads large files from a server and hangs frequently on a particular machine. It has worked successfully under RHEL5.2 for a long time. We have recently upgraded to RHEL6.1 and it now hangs regularly.

我创建了一个测试程序能重现问题。它挂起约98次满分100。

I have created a test app that reproduces the problem. It hangs approx 98 times out of 100.

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/param.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <netdb.h>
#include <sys/socket.h>
#include <sys/time.h>

int mFD = 0;

void open_socket()
{
  struct addrinfo hints, *res;
  memset(&hints, 0, sizeof(hints));
  hints.ai_socktype = SOCK_STREAM;
  hints.ai_family = AF_INET;

  if (getaddrinfo("localhost", "60000", &hints, &res) != 0)
  {
    fprintf(stderr, "Exit %d\n", __LINE__);
    exit(1);
  }

  mFD = socket(res->ai_family, res->ai_socktype, res->ai_protocol);

  if (mFD == -1)
  {
    fprintf(stderr, "Exit %d\n", __LINE__);
    exit(1);
  }

  if (connect(mFD, res->ai_addr, res->ai_addrlen) < 0)
  {
    fprintf(stderr, "Exit %d\n", __LINE__);
    exit(1);
  }

  freeaddrinfo(res);
}

void read_message(int size, void* data)
{
  int bytesLeft = size;
  int numRd = 0;

  while (bytesLeft != 0)
  {
    fprintf(stderr, "reading %d bytes\n", bytesLeft);

    /* Replacing MSG_WAITALL with 0 works fine */
    int num = recv(mFD, data, bytesLeft, MSG_WAITALL);

    if (num == 0)
    {
      break;
    }
    else if (num < 0 && errno != EINTR)
    {
      fprintf(stderr, "Exit %d\n", __LINE__);
      exit(1);
    }
    else if (num > 0)
    {
      numRd += num;
      data += num;
      bytesLeft -= num;
      fprintf(stderr, "read %d bytes - remaining = %d\n", num, bytesLeft);
    }
  }

  fprintf(stderr, "read total of %d bytes\n", numRd);
}

int main(int argc, char **argv)
{
  open_socket();

  uint32_t raw_len = atoi(argv[1]);
  char raw[raw_len];

  read_message(raw_len, raw);

  return 0;
}

这是我的测试中的一些注意事项:

Some notes from my testing:


  • 如果localhost的映射到环回地址127.0.0.1,应用程序挂在调用的recv()和永不再来。

  • 如果localhost的映射到本机的IP,从而路由通过以太网接口的数据包,应用程序成功完成。

  • 当我遇到一个坑,服务器会发送一个TCP窗口已满的消息,并在客户端与TCP ZeroWindow的消息(见图片和连接的tcpdump捕获)响应。从这一点来说,它与服务器发送保持有效指示和客户端发送ZeroWindow消息永远挂起。客户端似乎永远不会扩大其窗口,允许转移来完成。

  • 的挂起期间,如果我检查netstat -a可的输出,没有在服务器中的数据发送队列中,但客户端接收队列为空。

  • 如果我在的recv()调用删除MSG_WAITALL标志,应用程序成功完成。

  • 的悬挂问题仅出现1使用特定的计算机上的环回接口。我怀疑这可能都涉及到时间的依赖关系。

  • 当我把'文件',挂起的可能性的大小发生减小

有关测试程序的源代码可以在这里找到:

The source for the test app can be found here:

插座测试源

从loopback接口tcpdump的捕捉可以在这里找到:

The tcpdump capture from the loopback interface can be found here:

tcpdump的捕捉

我发出以下命令重现该问题:

I reproduce the issue by issuing the following commands:

>  gcc socket_test.c -o socket_test
>  perl -e 'for (1..6000000){ print "a" }' | nc -l 60000
>  ./socket_test 6000000

此看到发送到尝试使用一个调用的recv()来读取数据测试程序6000000字节。

This sees 6000000 bytes sent to the test app which tries to read the data using a single call to recv().

我很想听听你对我可能是做错了任何建议或任何其他的方法来调试问题。

I would love to hear any suggestions on what I might be doing wrong or any further ways to debug the issue.

推荐答案

MSG_WAITALL 的块,直到所有的数据已经被接收。从上的recv 手册页:

MSG_WAITALL should block until all data has been received. From the manual page on recv:

这标志请求操作块,直到完整的请求得到满足。

This flag requests that the operation block until the full request is satisfied.

然而,在网络堆栈缓冲器可能不足够大以包含一切,这是为在服务器上的错误消息的原因。客户端网络堆栈根本无法容纳如此多的数据。

However, the buffers in the network stack probably are not large enough to contain everything, which is the reason for the error messages on the server. The client network stack simply can't hold that much data.

解决方法是,增加缓冲区大小( SO_RCVBUF 选项的setsockopt ),拆分信息成更小件,或接受更小的块把它变成您自己的缓冲区。最后是什么,我会推荐。

The solution is either to increase the buffer sizes (SO_RCVBUF option to setsockopt), split the message into smaller pieces, or receiving smaller chunks putting it into your own buffer. The last is what I would recommend.

编辑:我看到你的code,你已经做的,我建议什么(阅读自己的缓冲较小的块,),所以只是删除 MSG_WAITALL 标志,它应该工作。

I see in your code that you already do what I suggested (read smaller chunks with own buffering,) so just remove the MSG_WAITALL flag and it should work.

哦,当的recv 返回零,这意味着另一端已经关闭了连接,你也应该这样做。

Oh, and when recv returns zero, that means the other end have closed the connection, and that you should do it too.

这篇关于与MSG_WAITALL大消息的Socket的recv()挂的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆