C#从庞大的网址列表中下载数据 [英] C# Download data from huge list of urls
问题描述
我有大量显示状态的网页列表,我需要检查这些状态. 某些网址位于同一站点内,另一组位于另一个站点上.
I have a huge list of web pages which display a status, which i need to check. Some urls are within the same site, another set is located on another site.
现在,我正在尝试通过使用以下代码以并行方式执行此操作,但是我感觉自己造成了过多的开销.
Right now i'm trying to do this in a parallel way by using code like below, but i have the feeling that i'm causing too much overhead.
while(ListOfUrls.Count > 0){
Parallel.ForEach(ListOfUrls, url =>
{
WebClient webClient = new WebClient();
webClient.DownloadString(url);
... run my checks here..
});
ListOfUrls = GetNewUrls.....
}
可以用更少的开销来完成此操作,并且可以对使用/重复使用多少个Web客户端和连接进行更多控制吗?那么,最终可以更快地完成这项工作吗?
Can this be done with less overhead, and some more control over how many webclients and connections i use/reuse? So, that in the end the job can be done faster?
推荐答案
Parallel.ForEach
对于CPU绑定的计算任务非常有用,但是对于您来说,对于像DownloadString
这样的同步IO绑定调用,它将不必要的块池线程.通过使用DownloadStringTaskAsync
和任务代替
Parallel.ForEach
is good for CPU-bound computational tasks, but it will unnecessary block pool threads for synchronous IO-bound calls like DownloadString
in your case. You can improve the scalability of your code and reduce the number of threads it may use, by using DownloadStringTaskAsync
and tasks instead:
// non-blocking async method
async Task<string> ProcessUrlAsync(string url)
{
using (var webClient = new WebClient())
{
string data = await webClient.DownloadStringTaskAsync(new Uri(url));
// run checks here..
return data;
}
}
// ...
if (ListOfUrls.Count > 0) {
var tasks = new List<Task>();
foreach (var url in ListOfUrls)
{
tasks.Add(ProcessUrlAsync(url));
}
Task.WaitAll(tasks.ToArray()); // blocking wait
// could use await here and make this method async:
// await Task.WhenAll(tasks.ToArray());
}
这篇关于C#从庞大的网址列表中下载数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!