HTTPWebResponse + StreamReader的非常慢 [英] HTTPWebResponse + StreamReader Very Slow

查看:741
本文介绍了HTTPWebResponse + StreamReader的非常慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想C#实现有限的网络爬虫(仅几百网站)
使用HttpWebResponse.GetResponse()和Streamreader.ReadToEnd(),也试过用StreamReader.Read()和一个循环来建立我的HTML字符串。

I'm trying to implement a limited web crawler in C# (for a few hundred sites only) using HttpWebResponse.GetResponse() and Streamreader.ReadToEnd() , also tried using StreamReader.Read() and a loop to build my HTML string.

我只下载这是关于5-10K页。

这一切都非常慢!例如,平均的GetResponse()的时间为约半秒,而平均StreamREader.ReadToEnd()的时间为约5秒!

It's all very slow! For example, the average GetResponse() time is about half a second, while the average StreamREader.ReadToEnd() time is about 5 seconds!

所有站点都应该是非常快的,因为他们是非常接近我的位置,并有快速的服务器。 (在资源管理器需要几乎没有什么给D / L)和我没有使用任何代理。

All sites should be very fast, as they are very close to my location, and have fast servers. (in Explorer takes practically nothing to D/L) and I am not using any proxy.

我的履带有大约20个线程来自同一个站点同时阅读。难道这是导致问题?

My Crawler has about 20 threads reading simultaneously from the same site. Could this be causing a problem?

我如何减少StreamReader.ReadToEnd时间大大?

How do I reduce StreamReader.ReadToEnd times DRASTICALLY?

推荐答案

HttpWebRequest的可服用一段时间才能检测到您的代理设置。尝试添加给你的应用程序配置:

HttpWebRequest may be taking a while to detect your proxy settings. Try adding this to your application config:

<system.net>
  <defaultProxy enabled="false">
    <proxy/>
    <bypasslist/>
    <module/>
  </defaultProxy>
</system.net>

您也可以看到你的缓存读取,以减少对底层操作系统插槽的呼叫数量略有提高性能:

You might also see a slight performance gain from buffering your reads to reduce the number of calls made to the underlying operating system socket:

using (BufferedStream buffer = new BufferedStream(stream))
{
  using (StreamReader reader = new StreamReader(buffer))
  {
    pageContent = reader.ReadToEnd();
  }
}

这篇关于HTTPWebResponse + StreamReader的非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆