Python的 - 从（非.PDF）URL下载PDF [英] Python - Download pdf from (non .pdf) url

查看：261 发布时间：2016/6/15 21:19:48 python asp.net pdf

本文介绍了Python的 - 从（非.PDF）URL下载PDF的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图从已登录的网站下载大约20个左右的PDF文件。这是我迄今为止，但无法下载任何有效的PDF文件（即它们都损坏）。我也是新的Python。

I am trying to download around 20 or so pdfs from a site that has a login. This is what I have so far but it fails to download any valid pdfs (i.e. they are all corrupted). I am also new to python.

import mechanize
import urllib2

def download_file(download_url):
    response = urllib2.urlopen(download_url)
    print response.geturl() 
    print response.read()
    file = open("document.pdf", 'wb')
    file.write(response.read())
    file.close()

brwser = mechanize.Browser()
brwser.addheaders = [('User-agent', 'Firefox')]
response = brwser.open(url)

brwser.select_form(nr = 0)
brwser.form['UserName'] = 'username'
brwser.form['Password'] = 'password'
nextpage = brwser.submit()

# Navigate to the page I want

for link in brwser.links():
    if link.text == 'Some pdf':
        request = brwser.follow_link(link)
        download_file(link.url)

我不知道去尝试什么。对于PDF文件的URL都是这样

I am not sure what to try. The urls for the pdfs are like this

的https://example.com/something/source2.aspx?id=e9a9bfdc-7d97-e411-9e03-76439cf4d30e

另外，根据response.read（）如下：

Also the response.read() is as follows:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>
Source
</title>
<script type='text/javascript'>
   window.onload = function () {
       var url = window.location.href.replace('source.aspx?', 'source2.aspx?');
       window.location = url;
   };
</script>
</head>
<body>
<div style='position:fixed; height:100%; width:100%; overflow:hidden; top:100px; left:100px;'>Loading, please wait.</div>
</body>
</html>

那么，如何下载这些文件？

So how do I download these files?

Python的 - 从（非.PDF）URL下载PDF [英] Python - Download pdf from (non .pdf) url

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

Python的 - 从（非.PDF）URL下载PDF [英] Python - Download pdf from (non .pdf) url

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭