升压ASIO async_write_some实在是太慢了 [英] Boost ASIO async_write_some is really slow

查看:334
本文介绍了升压ASIO async_write_some实在是太慢了的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我终于找到了我的服务器的瓶颈,它原来是 async_write 和同样为 async_write_some

I finally found the bottleneck of my server and it turns out to be async_write and the same goes for async_write_some.

下面以下基准code:

Here the following benchmark code:

struct timespec start, end;
clock_gettime(CLOCK_MONOTONIC, &start);

//boost::asio::async_write(mMainData.mSocket, boost::asio::buffer(pSendBuff->pBuffer, pSendBuff->dwUsedSize), mMainData.mStrand.wrap(boost::bind(&CServer::WriteHandler, pServer, this, pSendBuff, boost::asio::placeholders::error, boost::asio::placeholders::bytes_transferred)));
mMainData.mSocket.async_write_some(boost::asio::buffer(pSendBuff->pBuffer, pSendBuff->dwUsedSize), (boost::bind(&CServer::WriteHandler, pServer, this, pSendBuff, boost::asio::placeholders::error, boost::asio::placeholders::bytes_transferred)));

clock_gettime(CLOCK_MONOTONIC, &end);

timespec temp;
if ((end.tv_nsec - start.tv_nsec) < 0)
{
    temp.tv_sec = end.tv_sec - start.tv_sec - 1;
    temp.tv_nsec = 1000000000 + end.tv_nsec - start.tv_nsec;
}
else
{
    temp.tv_sec = end.tv_sec - start.tv_sec;
    temp.tv_nsec = end.tv_nsec - start.tv_nsec;
}

pLogger->WriteToFile("./Logs/Benchmark_SendPacketP_AsyncWrite.txt", "dwDiff: %.4f\r\n", (float)temp.tv_nsec / 1000000.0f);

和输出:

-[2016.05.21 03:45:19] dwDiff: 0.0552ms
-[2016.05.21 03:45:19] dwDiff: 0.0404ms
-[2016.05.21 03:45:19] dwDiff: 0.0542ms
-[2016.05.21 03:45:20] dwDiff: 0.0576ms

这是荒谬的慢,因为它是一个游戏服务器,我需要播出渠道的房间包,其中有300个球员在1路,想象它会导致我的球员们的网络延迟。

This is absurdly slow as it is a gameserver and i need to broadcast packets in room channels, which have 300 players in 1 channel, imagine the network delay it causes to my players.

Ofcourse该测试是在服务器与仅做自己。

Ofcourse this test was done with only myself in the server.

这是我的code这是错误还是我失去了在ASIO实现逻辑的东西吗?

Is it my code that is wrong or am i missing something in ASIO implementation logic?

CXXFLAGS: -ggdb -ffunction-sections -Ofast -m64 -pthread -fpermissive -w -lboost_system -lboost_thread -Wall -fomit-frame-pointer
LDFLAGS: -Wl,-gc-sections -m64 -pthread -fpermissive -w -lboost_system -lboost_thread -lcurl

该硬件是:
英特尔至强E3-1231v3(4核心,8线程)
64GB RAM
1GBPS上行

The hardware is: Intel Xeon E3-1231v3 (4 cores, 8 threads) 64GB RAM 1GBPS Uplink

我产卵8 ASIO工人。

I am spawning 8 ASIO workers.

所以我是加强与调试的async_write内,发现这一点:

So i was stepping inside the async_write with a debugger and found this:

template <typename ConstBufferSequence, typename Handler>
void async_send(base_implementation_type& impl,
  const ConstBufferSequence& buffers,
  socket_base::message_flags flags, Handler& handler)
{
bool is_continuation =
  boost_asio_handler_cont_helpers::is_continuation(handler);

// Allocate and construct an operation to wrap the handler.
typedef reactive_socket_send_op<ConstBufferSequence, Handler> op;
typename op::ptr p = { boost::asio::detail::addressof(handler),
  boost_asio_handler_alloc_helpers::allocate(
    sizeof(op), handler), 0 };
p.p = new (p.v) op(impl.socket_, buffers, flags, handler);

BOOST_ASIO_HANDLER_CREATION((p.p, "socket", &impl, "async_send"));

start_op(impl, reactor::write_op, p.p, is_continuation, true,
    ((impl.state_ & socket_ops::stream_oriented)
      && buffer_sequence_adapter<boost::asio::const_buffer,
        ConstBufferSequence>::all_empty(buffers)));
p.v = p.p = 0;
}

为什么会提高:: ASIO呼叫一个新应该是高性能的图书馆吗?
反正是有precreate什么的尝试分配?
抱歉,我不能分析的内部作为即时通讯与VisualGDB开发与Microsoft Visual Studio,与具有GCC 4.8.5工具在VMWare中运行。

Why would boost::asio call "new" in a supposed to be high performance library? Is there anyway to precreate what its trying to allocate? Sorry i cannot profile the internals as im developing with VisualGDB with Microsoft Visual Studio, with having the GCC 4.8.5 toolset running in VMWare.

推荐答案

如果没有探查,试图找出哪个指令是瓶颈将可能是耐心徒劳的考验。创建一个最小的例子可以帮助识别在特定的环境问题的来源。例如,在受控的情况,那儿没有争用的I / O,也不是 io_service对象,我观察0.015ms〜写入使用的本地的write() 和的短耳的 async_write()

Without a profiler, trying to identify which instruction is the bottleneck will likely be a futile test of patience. Creating a minimal example may help identify the source of the problem in a particular environment. For example, in a controlled scenario where there is neither contention for the I/O nor the io_service, I observe 0.015ms~ writes for when using native write() and Asio's async_write().

试图要解决的问题是要写入相同的消息300对等体以最小的延迟。一种解决方案可能是并行的问题:不是具有连续写入消息300的同行一个作业,则可以考虑使用并行运行的 N 工作和写邮件给 300 / N 同行串行。一些粗略的估算:

The problem attempting to be solved is to write the same message to 300 peers with minimal latency. One solution may be to parallelize the problem: instead of having a single job that writes the message to 300 peers serially, consider using n jobs that run in parallel and write the message to 300/n peers serially. With some rough estimates:


  • 如果300写操作串行执行,每个服用.015ms(使用本机的write()在受控环境中观察时,平均),最后写操作将启动4.485ms之后的第一个写操作。

  • 如果300写操作的基础上(在本实施例8)的电位并发限制分批,会有在并行地执行38写入运行串行8的作业。如果每个写操作的0.0576ms(实际系统中观察到的),最后写操作将首次写入后开始2.13ms。

  • if 300 writes are performed serially, with each taking .015ms (observed average when using native write() in a controlled environment), the final write will start 4.485ms after the first write.
  • if 300 writes are batched based on the potential concurrency limit (8 in this example), there will be 8 jobs running in parallel performing 38 writes serially. If each write takes the 0.0576ms (observed on actual system), the final write will start 2.13ms after the first write.

通过上面的估算,通过并联的问题,它需要一半的时间写300的同龄人,甚至当每一个人 asyc_write 操作需要比预期更长。请记住,这只是粗略的估计,一个需要配置文件来确定并发理想量,以及确定潜在的瓶颈。

With the above estimates, by paralleling the problem, its takes half the time to write to 300 peers, even when each individual asyc_write operation takes longer than expected. Keep in mind that these are rough estimates, and one would need to profile to determine the ideal amount of concurrency, as well as identify the underlying bottlenecks.

这篇关于升压ASIO async_write_some实在是太慢了的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆