如何以“更智能"的方式使用python下载文件? [英] How to download a file using python in a 'smarter' way?

查看:27
本文介绍了如何以“更智能"的方式使用python下载文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在 Python 中通过 http 下载多个文件.

I need to download several files via http in Python.

最明显的方法就是使用 urllib2:

The most obvious way to do it is just using urllib2:

import urllib2
u = urllib2.urlopen('http://server.com/file.html')
localFile = open('file.html', 'w')
localFile.write(u.read())
localFile.close()

但我必须以某种方式处理令人讨厌的 URL,像这样说:http://server.com/!Run.aspx/someoddtext/somemore?id=121&m=pdf.当通过浏览器下载时,文件有一个人类可读的名称,即.accounts.pdf.

But I'll have to deal with the URLs that are nasty in some way, say like this: http://server.com/!Run.aspx/someoddtext/somemore?id=121&m=pdf. When downloaded via the browser, the file has a human-readable name, ie. accounts.pdf.

有没有办法在 python 中处理这个问题,所以我不需要知道文件名并将它们硬编码到我的脚本中?

Is there any way to handle that in python, so I don't need to know the file names and hardcode them into my script?

推荐答案

下载这样的脚本往往会推送一个标题,告诉用户代理该文件的名称:

Download scripts like that tend to push a header telling the user-agent what to name the file:

Content-Disposition: attachment; filename="the filename.ext"

如果你能抓住那个标题,你就能得到正确的文件名.

If you can grab that header, you can get the proper filename.

另一个线程,其中有一些代码可用于Content-处置-抓取.

There's another thread that has a little bit of code to offer up for Content-Disposition-grabbing.

remotefile = urllib2.urlopen('http://example.com/somefile.zip')
remotefile.info()['Content-Disposition']

这篇关于如何以“更智能"的方式使用python下载文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆