TPL数据流加速? [英] TPL Dataflow Speedup?

查看:223
本文介绍了TPL数据流加速?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道以下代码是否可以优化执行速度更快。我目前似乎在一个非常简单的数据流结构上,每秒最多有大约140万条简单消息。我知道这个示例过程同步传递/转换消息,但是,我目前测试TPL数据流作为一个可能的替代我自己的自定义解决方案基于任务和并发集合。我知道术语并发已经建议我并行运行的东西,但为了当前的测试目的,我推送消息在我自己的解决方案通过同步,我达到约每秒510万条消息。我在这里缺少什么,我读TPL Dataflow被推为一个高吞吐量,低延迟的解决方案,但到目前为止我必须忽略性能调整。

I wonder whether the following code can be optimized to execute faster. I currently seem to max out at around 1.4 million simple messages per second on a pretty simple data flow structure. I am aware that this sample process passes/transforms messages synchronously, however, I currently test TPL Dataflow as a possible replacement for my own custom solution based on Tasks and concurrent collections. I know the terms "concurrent" already suggest I run things in parallel but for current testing purposes I pushed messages on my own solution through synchronously and I get to about 5.1 million messages per second. What am I missing here, I read TPL Dataflow was pushed as a high throughput, low latency solution but so far I must be overlooking performance tweaks. Anyone who could point me into the right direction please?

class TPLDataFlowExperiments
{
    public TPLDataFlowExperiments()
    {
        var buf1 = new BufferBlock<int>();

        var transform = new TransformBlock<int, string>(t =>
            {
                return "";
            });

        var action = new ActionBlock<string>(s =>
            {
                //Thread.Sleep(100);
                //Console.WriteLine(s);
            });

        buf1.LinkTo(transform);
        transform.LinkTo(action);

        //Propagate all Completions down the flow
        buf1.Completion.ContinueWith(t =>
        {
            transform.Complete();
            transform.Completion.ContinueWith(u =>
            {
                action.Complete();
            });
        });

        Stopwatch watch = new Stopwatch();
        watch.Start();

        int cap = 10000000;
        for (int i = 0; i < cap; i++)
        {
            buf1.Post(i);
        }

        //Mark Buffer as Complete
        buf1.Complete();

        action.Completion.ContinueWith(t =>
            {
                watch.Stop();

                Console.WriteLine("All Blocks finished processing");
                Console.WriteLine("Units processed per second: " + cap / watch.ElapsedMilliseconds * 1000);
            });

        Console.ReadLine();
    }
}


推荐答案

I认为这主要归结为一件事:你的测试几乎没有意义。所有这些块都应该做一些事情,并使用多个核心和异步操作来做到这一点。

I think this mostly comes down to one thing: your test is pretty much meaningless. All those blocks are supposed to do something, and use multiple cores and asynchronous operations to do that.

另外,在你的测试中,可能需要很多时间同步。使用更现实的代码,代码将需要一些时间来执行,因此会有较少争用,因此实际开销将小于您测量的开销。

Also, in your test, it's likely that a lot of time is spent on synchronization. With a more realistic code, the code will take some time to execute, so there will be less contention, so the actual overhead will be smaller than what you measured.

但是实际回答你的问题,是的,你忽略了一些性能调整。具体来说, SingleProducerConstrained ,这意味着可以使用具有较少锁定的数据结构。如果我在这两个块上使用这个( BufferBlock 在这里完全没用,你可以安全地删除它),速度从每秒大约3-4百万个项目提高到更多在我的电脑上超过了5百万。

But to actually answer your question, yes, you're overlooking some performance tweaks. Specifically, SingleProducerConstrained, which means data structures with less locking can be used. If I use this on both blocks (the BufferBlock is completely useless here, you can safely remove it), the rate raises from about 3–4 millions of items per second to more than 5 millions on my computer.

这篇关于TPL数据流加速?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆