C# 从巨大的 url 列表中下载数据 [英] C# Download data from huge list of urls

查看:25
本文介绍了C# 从巨大的 url 列表中下载数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一大堆显示状态的网页列表,我需要检查一下.一些 url 位于同一站点内,另一组位于另一个站点上.

I have a huge list of web pages which display a status, which i need to check. Some urls are within the same site, another set is located on another site.

现在我正在尝试使用如下代码以并行方式执行此操作,但我感觉我造成了太多开销.

Right now i'm trying to do this in a parallel way by using code like below, but i have the feeling that i'm causing too much overhead.

while(ListOfUrls.Count > 0){
  Parallel.ForEach(ListOfUrls, url =>
  {
    WebClient webClient = new WebClient();
    webClient.DownloadString(url);
    ... run my checks here.. 
  });

  ListOfUrls = GetNewUrls.....
}

这是否可以以更少的开销来完成,并且可以更好地控制我使用/重用的 Web 客户端和连接的数量?那么,那到底能不能更快地完成工作呢?

Can this be done with less overhead, and some more control over how many webclients and connections i use/reuse? So, that in the end the job can be done faster?

推荐答案

Parallel.ForEach 适用于 CPU 密集型计算任务,但对于同步 IO 密集型调用,它会不必要地阻塞池线程,例如DownloadString 在你的情况下.您可以通过使用 DownloadStringTaskAsync 和任务来提高代码的可伸缩性并减少它可能使用的线程数:

Parallel.ForEach is good for CPU-bound computational tasks, but it will unnecessary block pool threads for synchronous IO-bound calls like DownloadString in your case. You can improve the scalability of your code and reduce the number of threads it may use, by using DownloadStringTaskAsync and tasks instead:

// non-blocking async method
async Task<string> ProcessUrlAsync(string url)
{
    using (var webClient = new WebClient())
    {
        string data = await webClient.DownloadStringTaskAsync(new Uri(url));
        // run checks here.. 
        return data;
    }
}

// ...

if (ListOfUrls.Count > 0) {
    var tasks = new List<Task>();
    foreach (var url in ListOfUrls)
    {
      tasks.Add(ProcessUrlAsync(url));
    }

    Task.WaitAll(tasks.ToArray()); // blocking wait

    // could use await here and make this method async:
    // await Task.WhenAll(tasks.ToArray());
}

这篇关于C# 从巨大的 url 列表中下载数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆