在 Python 3 中从网络下载文件 [英] Download file from web in Python 3

查看:55
本文介绍了在 Python 3 中从网络下载文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建一个程序,该程序将从 Web 服务器下载 .jar (java) 文件,方法是读取同一游戏/应用程序的 .jad 文件中指定的 URL.我使用的是 Python 3.2.1

I am creating a program that will download a .jar (java) file from a web server, by reading the URL that is specified in the .jad file of the same game/application. I'm using Python 3.2.1

我已经设法从 JAD 文件中提取 JAR 文件的 URL(每个 JAD 文件都包含 JAR 文件的 URL),但正如您想象的那样,提取的值是 type() 字符串.

I've managed to extract the URL of the JAR file from the JAD file (every JAD file contains the URL to the JAR file), but as you may imagine, the extracted value is type() string.

相关函数如下:

def downloadFile(URL=None):
    import httplib2
    h = httplib2.Http(".cache")
    resp, content = h.request(URL, "GET")
    return content

downloadFile(URL_from_file)

但是我总是收到一个错误,说上面函数中的类型必须是字节,而不是字符串.我试过使用 URL.encode('utf-8') 和 bytes(URL,encoding='utf-8'),但我总是遇到相同或类似的错误.

However I always get an error saying that the type in the function above has to be bytes, and not string. I've tried using the URL.encode('utf-8'), and also bytes(URL,encoding='utf-8'), but I'd always get the same or similar error.

所以基本上我的问题是当 URL 以字符串类型存储时如何从服务器下载文件?

So basically my question is how to download a file from a server when the URL is stored in a string type?

推荐答案

如果要将网页的内容获取到变量中,只需读取urllib.request.urlopen:

If you want to obtain the contents of a web page into a variable, just read the response of urllib.request.urlopen:

import urllib.request
...
url = 'http://example.com/'
response = urllib.request.urlopen(url)
data = response.read()      # a `bytes` object
text = data.decode('utf-8') # a `str`; this step can't be used if data is binary


下载和保存文件最简单的方法是使用 urllib.request.urlretrieve 函数:

import urllib.request
...
# Download the file from `url` and save it locally under `file_name`:
urllib.request.urlretrieve(url, file_name)

import urllib.request
...
# Download the file from `url`, save it in a temporary directory and get the
# path to it (e.g. '/tmp/tmpb48zma.txt') in the `file_name` variable:
file_name, headers = urllib.request.urlretrieve(url)

但请记住,urlretrieve 被认为是 legacy 并且可能会被弃用(但不确定为什么).

But keep in mind that urlretrieve is considered legacy and might become deprecated (not sure why, though).

因此,最正确的方法是使用urllib.request.urlopen 函数返回表示 HTTP 响应的类文件对象并将其复制到使用 shutil.copyfileobj 的真实文件.

So the most correct way to do this would be to use the urllib.request.urlopen function to return a file-like object that represents an HTTP response and copy it to a real file using shutil.copyfileobj.

import urllib.request
import shutil
...
# Download the file from `url` and save it locally under `file_name`:
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
    shutil.copyfileobj(response, out_file)

如果这看起来太复杂,您可能想要更简单,将整个下载存储在 bytes 对象中,然后将其写入文件.但这仅适用于小文件.

If this seems too complicated, you may want to go simpler and store the whole download in a bytes object and then write it to a file. But this works well only for small files.

import urllib.request
...
# Download the file from `url` and save it locally under `file_name`:
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
    data = response.read() # a `bytes` object
    out_file.write(data)


可以动态提取.gz(可能还有其他格式)压缩数据,但这样的操作可能需要 HTTP 服务器支持对文件的随机访问.


It is possible to extract .gz (and maybe other formats) compressed data on the fly, but such an operation probably requires the HTTP server to support random access to the file.

import urllib.request
import gzip
...
# Read the first 64 bytes of the file inside the .gz archive located at `url`
url = 'http://example.com/something.gz'
with urllib.request.urlopen(url) as response:
    with gzip.GzipFile(fileobj=response) as uncompressed:
        file_header = uncompressed.read(64) # a `bytes` object
        # Or do anything shown above using `uncompressed` instead of `response`.

这篇关于在 Python 3 中从网络下载文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆