反正并行产量C# [英] Anyway to Parallel Yield c#

查看:103
本文介绍了反正并行产量C#的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个枚举平面文件的多个枚举器。我本来在一个并行调用每个枚举,每个动作被添加到一个 BlockingCollection<实体> 和藏品被返回ConsumingEnumerable();

 公共接口IFlatFileQuery 
{
IEnumerable的<实体GT;跑();
}

公共类FlatFile1:IFlatFileQuery
{
公开的IEnumerable<实体GT;运行()
{
//在一个平面文件环路,并产生每个结果
收益率的回报实体;
}
}

公共类主要
{
公开的IEnumerable<实体GT; DoLongTask(ICollection的< IFlatFileQuery> _flatFileQueries)
{
//做到这一点需要先返回一些其他的东西:
收益率的回报实体;

//然后枚举并返回平面文件数据
的foreach(VAR实体的GetData(_flatFileQueries))
{
收益率的回报实体;
}
}

私人的IEnumerable<实体GT;的GetData(_flatFileQueries)
{
变种缓冲液=新BlockingCollection&所述;实体>(100);

变种行动= _flatFileQueries.Select(fundFileQuery = GT;(动作)(()=>在fundFileQuery.Run())
$ B {
的foreach(变种实体$ b {
buffer.TryAdd(实体,Timeout.Infinite);
}
}))ToArray的();

Task.Factory.StartNew(()=>
{
Parallel.Invoke(动作);

buffer.CompleteAdding();
});

返回buffer.GetConsumingEnumerable();
}
}



位测试的事实证明,后然而,下面的代码改变是约20-25%的速度

 私人的IEnumerable<实体GT; 。的GetData(_flatFileQueries)
{
返回_flatFileQueries.AsParallel()的SelectMany(FFQ => ffq.Run());
}



与代码更改的麻烦是,它等待,直到所有的平面文件的查询是列举它返回一大堆,然后可以列举,并取得了之前。



有没有可能在上面的代码位产生某种方式,使其更快?



我要补充一点,在最全的平面文件查询的组合结果可能只有1000左右的实体。



修改
将其更改为以下不会对运行时间的差异。 (R#甚至建议回到从前的样子)

 私人的IEnumerable<实体GT;的GetData(_flatFileQueries)
{
的foreach(在_flatFileQueries.AsParallel VAR实体()的SelectMany(FFQ =方式> ffq.Run()))
{
收益率的回报实体;
}
}


解决方案

与代码更改的麻烦是,它会等待,直到它返回一大堆,然后可以列举,并取得了之前的所有平面文件查询枚举。




让我们证明,它是由一个简单的例子假的。首先,让我们创建一个 TestQuery 类的指定时间后,将产生一个单一的实体。其次,我们来执行多个测试查询的并行和测量它花了多长时间产生的结果。



 公共类TestQuery:IFlatFileQuery { 

私人只读INT _sleepTime;

公开的IEnumerable<实体GT;运行(){
Thread.sleep代码(_sleepTime);
返回新的[] {新实体()};
}

公共TestQuery(INT睡眠时间){
_sleepTime =睡眠时间;
}

}

内部静态类节目{

私有静态无效的主要(){
秒表=秒表.StartNew();
VAR的查询=新IFlatFileQuery [] {
新TestQuery(2000年),
新TestQuery(3000),
新TestQuery(1000)
};
的foreach(在queries.AsParallel VAR实体()的SelectMany(FFQ => ffq.Run()))
Console.WriteLine({0:N0}后墓内秒,stopwatch.Elapsed .TotalSeconds);
Console.ReadKey();
}

}

此代码打印:




在经过3秒后墓内2秒结果
墓内1秒结果
墓内




您可以用此输出看到进行AsParallel()将尽快它可产生每个结果,使一切工作正常。请注意,你可能会得到不同的时间取决于并行度(如2S,5S,6S度为1的并行性,有效地使整个操作并不平行)。这个输出来自一个-4-芯机



您长的处理可能会与核的数量比例,如果有螺纹之间没有共同的瓶颈(如共享锁定的资源)。您可能需要配置您的算法,看看是否有可以使用的工具,如 dotTrace


I have multiple enumerators that enumerate over flat files. I originally had each enumerator in a Parallel Invoke and each Action was adding to a BlockingCollection<Entity> and that collections was returning a ConsumingEnumerable();

public interface IFlatFileQuery
{
    IEnumerable<Entity> Run();
}

public class FlatFile1 : IFlatFileQuery
{
    public IEnumerable<Entity> Run()
    {
        // loop over a flat file and yield each result
        yield return Entity;
    }
} 

public class Main
{
    public IEnumerable<Entity> DoLongTask(ICollection<IFlatFileQuery> _flatFileQueries)
    {
            // do some other stuff that needs to be returned first:
            yield return Entity;

            // then enumerate and return the flat file data
        foreach (var entity in GetData(_flatFileQueries))
        {
            yield return entity;
        }
    }

    private IEnumerable<Entity> GetData(_flatFileQueries)
    {
        var buffer = new BlockingCollection<Entity>(100);

        var actions = _flatFileQueries.Select(fundFileQuery => (Action)(() =>
        {
            foreach (var entity in fundFileQuery.Run())
            {
                buffer.TryAdd(entity, Timeout.Infinite);
            }
        })).ToArray();

        Task.Factory.StartNew(() =>
        {
            Parallel.Invoke(actions);

            buffer.CompleteAdding();
        });

        return buffer.GetConsumingEnumerable();
    }
}

However after a bit of testing it turns out that the code change below is about 20-25% faster.

private IEnumerable<Entity> GetData(_flatFileQueries)
{
    return _flatFileQueries.AsParallel().SelectMany(ffq => ffq.Run());
}

The trouble with the code change is that it waits till all flat file queries are enumerated before it returns the whole lot that can then be enumerated and yielded.

Would it be possible to yield in the above bit of code somehow to make it even faster?

I should add that at most the combined results of all the flat file queries might only be 1000 or so Entities.

Edit: Changing it to the below doesn't make a difference to the run time. (R# even suggests to go back to the way it was)

private IEnumerable<Entity> GetData(_flatFileQueries)
{
        foreach (var entity in _flatFileQueries.AsParallel().SelectMany(ffq => ffq.Run()))
        {
            yield return entity;
        }
}

解决方案

The trouble with the code change is that it waits till all flat file queries are enumerated before it returns the whole lot that can then be enumerated and yielded.

Let's prove that it's false by a simple example. First, let's create a TestQuery class that will yield a single entity after a given time. Second, let's execute several test queries in parallel and measure how long it took to yield their result.

public class TestQuery : IFlatFileQuery {

    private readonly int _sleepTime;

    public IEnumerable<Entity> Run() {
        Thread.Sleep(_sleepTime);
        return new[] { new Entity() };
    }

    public TestQuery(int sleepTime) {
        _sleepTime = sleepTime;
    }

}

internal static class Program {

    private static void Main() {
        Stopwatch stopwatch = Stopwatch.StartNew();
        var queries = new IFlatFileQuery[] {
            new TestQuery(2000),
            new TestQuery(3000),
            new TestQuery(1000)
        };
        foreach (var entity in queries.AsParallel().SelectMany(ffq => ffq.Run()))
            Console.WriteLine("Yielded after {0:N0} seconds", stopwatch.Elapsed.TotalSeconds);
        Console.ReadKey();
    }

}

This code prints:

Yielded after 1 seconds
Yielded after 2 seconds
Yielded after 3 seconds

You can see with this output that AsParallel() will yield each result as soon as its available, so everything works fine. Note that you might get different timings depending on the degree of parallelism (such as "2s, 5s, 6s" with a degree of parallelism of 1, effectively making the whole operation not parallel at all). This output comes from an 4-cores machine.

Your long processing will probably scale with the number of cores, if there is no common bottleneck between the threads (such as a shared locked resource). You might want to profile your algorithm to see if there are slow parts that can be improved using tools such as dotTrace.

这篇关于反正并行产量C#的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆