并行LINQ - 使用多个线程比处理器(非CPU密集型任务) [英] Parallel Linq - Use more threads than processors (for non-CPU bound tasks)

查看:223
本文介绍了并行LINQ - 使用多个线程比处理器(非CPU密集型任务)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是并行LINQ,而我试图同时下载许多URL中使用essentily这样的代码:

I'm using parallel linq, and I'm trying to download many urls concurrently using essentily code like this:

int threads = 10;
Dictionary<string, string> results = urls.AsParallel( threads ).ToDictionary( url => url, url => GetPage( url );

由于下载网页,是网络的约束,而不是必然的CPU,采用多线程比我多个处理器/内核非常benificial,因为大多数的每个线程的时间都花在等待网络追上。但是,判断形式运行以上使用线程的事实= 2具有相同的性能,我的线程双核心的机器上= 10,我在想,发送到进行AsParallel履带仅限于核心的数量。

Since downloading web pages is Network bound rather than CPU bound, using more threads than my number of processors/cores is very benificial, since most of the time in each thread is spent waiting for the network to catch up. However, judging form the fact that running the above with threads = 2 has the same performance as threads = 10 on my dual core machine, I'm thinking that the treads sent to AsParallel is limited to the number of cores.

请问有什么办法可以覆盖这种行为?是否有类似的库可用的没有这个限制?

Is there any way to override this behavior? Is there a similar library available that doesn't have this limitation?

(我已经找到的Python这样的库,但需要的东西,在净工作)

(I've found such a library for python, but need something that works in .Net)

推荐答案

执行URL指向相同?服务器如果是这样,它可能是你打的HTTP连接限制,而不是线程的限制有一个简单的方法来告诉 - 更改您的代码:

Do the URLs refer to the same server? If so, it could be that you are hitting the HTTP connection limit instead of the threading limit. There's an easy way to tell - change your code to:

int threads = 10;
Dictionary<string, string> results = urls.AsParallel(threads)
    .ToDictionary(url => url, 
                  url => {
                      Console.WriteLine("On thread {0}",
                                        Thread.CurrentThread.ManagedThreadId);
                      return GetPage(url);
                  });



编辑:嗯。我不能让 ToDictionary()来parallelise的所有的有位示例代码。它工作正常的选择(URL => GETPAGE(URL))而不是 ToDictionary 。 。将搜索了一下周围

Hmm. I can't get ToDictionary() to parallelise at all with a bit of sample code. It works fine for Select(url => GetPage(url)) but not ToDictionary. Will search around a bit.

编辑:好吧,我还是不能让 ToDictionary 来parallelise,但你可以解决的。下面是一个简短但完整的程序:

Okay, I still can't get ToDictionary to parallelise, but you can work around that. Here's a short but complete program:

using System;
using System.Collections.Generic;
using System.Threading;
using System.Linq;
using System.Linq.Parallel;

public class Test
{

    static void Main()
    {
        var urls = Enumerable.Range(0, 100).Select(i => i.ToString());

        int threads = 10;
        Dictionary<string, string> results = urls.AsParallel(threads)
            .Select(url => new { Url=url, Page=GetPage(url) })
            .ToDictionary(x => x.Url, x => x.Page);
    }

    static string GetPage(string x)
    {
        Console.WriteLine("On thread {0} getting {1}",
                          Thread.CurrentThread.ManagedThreadId, x);
        Thread.Sleep(2000);
        return x;
    }
}



那么,多少线程做到这一点用呢? 5.为什么?天知道。我有2个处理器,所以这不是它 - 我们已经指定了10个线程,所以这不是它。它仍然采用5即使我换 GETPAGE 锤的CPU。

如果你只需要使用它进行一个特定的任务 - 你不介意微臭的代码 - 你可能是最好关闭自己实现它,是诚实的。

If you only need to use this for one particular task - and you don't mind slightly smelly code - you might be best off implementing it yourself, to be honest.

这篇关于并行LINQ - 使用多个线程比处理器(非CPU密集型任务)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆