下载使用WebRequests PDF文件 [英] Downloading pdf file using WebRequests

查看:315
本文介绍了下载使用WebRequests PDF文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想下载一些PDF文件自动地给出一个网址列表



下面是我的代码:

  HttpWebRequest的要求=(HttpWebRequest的)WebRequest.Create(URL); 

request.Method =GET;

变种编码=新UTF8Encoding();

request.Headers.Add(HttpRequestHeader.AcceptLanguage,EN-GB,连接; Q = 0.5);
request.Headers.Add(HttpRequestHeader.AcceptEncoding,gzip的,放气);

request.Accept =text / html的,是application / xhtml + xml的,应用/ XML; Q = 0.9 * / *; Q = 0.8;
request.UserAgent =Mozilla的/ 5.0(Windows NT的6.1; WOW64; RV:12.0)的Gecko / 20100101火狐/ 12.0;

HttpWebResponse RESP =(HttpWebResponse)request.GetResponse();

BinaryReader读者=新BinaryReader(resp.GetResponseStream());

的FileStream流=新的FileStream(输出/+与Date.toString(YYYY-MM-DD)+.PDF,FileMode.Create);

的BinaryWriter作家=新的BinaryWriter(流);


{
writer.Write(reader.Read())(reader.PeekChar()!= -1);
}
writer.Flush();
writer.Close();



所以,我知道的第一部分作品。原本我是得到它,并使用TextReader的阅读它 - 但是这给了我损坏的PDF文件(因为PDF文件是二进制文件)



现在如果我运行它,阅读器.PeekChar()始终是-1并没有任何反应 - 我得到一个空文件



在调试它,我注意到,reader.Read()实际上是给不同的号码当我调用它 - 所以也许皮克坏



所以,我想的东西很肮脏

 
{
,而(真)
{
writer.Write(reader.Read());
}
}

{
}
writer.Flush();
writer.Close();

现在我越来越与它的一些垃圾一个非常小的文件,但它仍然不是我?在寻找



所以,任何人都可以点我在正确的方向



其他信息:



头并不表明其压缩或其他任何东西。

  HTTP / 1.1 200 OK 
内容类型:应用程序/ PDF
服务器:Microsoft-IIS / 7.5
的X技术,通过:ASP.NET
日期:星期五,8月10日2012 GMT 11时15分48秒
的Content-Length:109809


解决方案

跳过 BinaryReader 的BinaryWriter ,只输入流复制到输出的FileStream 。简单地说

  VAR文件名=输出/+与Date.toString(YYYY-MM-DD)+.PDF ;使用
(VAR流= File.Create(文件名))
resp.GetResponseStream()CopyTo从(流)。


I'm trying to download a number of pdf files automagically given a list of urls.

Here's the code I have:

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);

request.Method = "GET";

var encoding = new UTF8Encoding();

request.Headers.Add(HttpRequestHeader.AcceptLanguage, "en-gb,en;q=0.5");
request.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip, deflate");

request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0";

HttpWebResponse resp = (HttpWebResponse)request.GetResponse();

BinaryReader reader = new BinaryReader(resp.GetResponseStream());

FileStream stream = new FileStream("output/" + date.ToString("yyyy-MM-dd") + ".pdf",FileMode.Create);

BinaryWriter writer = new BinaryWriter(stream);

while (reader.PeekChar() != -1)
      {
       writer.Write(reader.Read());
      }
       writer.Flush();
       writer.Close();

So, I know the first part works. I was originally getting it and reading it using a TextReader - but that gave me corrupted pdf files (since pdfs are binary files).

Right now if I run it, reader.PeekChar() is always -1 and nothing happens - I get an empty file.

While debugging it, I noticed that reader.Read() was actually giving different numbers when I was invoking it - so maybe Peek is broken.

So I tried something very dirty

try
{
 while (true)
   {
    writer.Write(reader.Read());
    }
 }
   catch
      {
      }
 writer.Flush();
 writer.Close();

Now I'm getting a very tiny file with some garbage in it, but its still not what I'm looking for.

So, anyone can point me in the right direction?

Additional Information:

The header doesn't suggest its compressed or anything else.

HTTP/1.1 200 OK
Content-Type: application/pdf
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Fri, 10 Aug 2012 11:15:48 GMT
Content-Length: 109809

解决方案

Skip the BinaryReader and BinaryWriter and just copy the input stream to the output FileStream. Briefly

var fileName = "output/" + date.ToString("yyyy-MM-dd") + ".pdf";
using (var stream = File.Create(fileName))
  resp.GetResponseStream().CopyTo(stream);

这篇关于下载使用WebRequests PDF文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆