许多html文件优化下载 [英] Optimizing download of many html files

查看：148 发布时间：2015/11/27 12:50:35 .net performance optimization

本文介绍了许多html文件优化下载的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一百万个网址指向了我要救我的磁盘公共网络服务器上的HTML页面。每个这些是大约相同的尺寸，约30千字节。我的URL列表被分成大约平均在20个文件夹在磁盘上，所以为了简单起见，我创建一个的任务的每个文件夹，并在每个任务我下载一个网址后，另外，按顺序。所以，让我在任何时间约20并行请求。我在一个比较糟糕的DSL，5Mbps的连接。

I have about a million urls pointing to HTML pages on a public web server that I want to save to my disk. Each of these is about the same size, ~30 kilobytes. My url lists are split about evenly in 20 folders on disk, so for simplicity I create one Task per folder, and in each task I download one URL after the other, sequentially. So that gives me about 20 parallel requests at any time. I'm on a relatively crappy DSL, 5mbps connection.

这再presents数据的数千兆字节，所以我很期待这个过程需要几个小时，但我想知道如果我能做出任何方法更有效。难道可能我做的最出我的连接？我该如何衡量？ 20并行下载好一些，或者我应该拨打上涨或下跌？

This represents several gigabytes of data so I'm expecting the process to take several hours, but I'm wondering if I could make the approach any more efficient. Is it likely I'm making the most out of my connection? How can I measure that? Is 20 parallel downloads a good number or should I dial up or down?

的语言F＃，我使用WebClient.DownloadFile为每个URL，每个任务一WebClient的。

The language is F#, I'm using WebClient.DownloadFile for every url, one WebClient per task.

==================================

编辑：有一件事，作出了巨大的差异，增加了一定的头请求：

One thing that made a huge difference was adding a certain header to the request:

let webClient = new WebClient()
webClient.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip,deflate")

该切下载的大小约为32K至9K，造成巨大的速度上涨，并节省磁盘空间。感谢TerryE的提吧！

This cut the size of downloads from about 32k to 9k, resulting in enormous speed gains and disk space savings. Thanks to TerryE for mentioning it!

许多html文件优化下载 [英] Optimizing download of many html files

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

许多html文件优化下载 [英] Optimizing download of many html files

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭