如何在 Python 3 中检索带有 User-Agent 标头的文件? [英] How can I retrieve files with User-Agent headers in Python 3?

查看：27 发布时间：2021/9/15 18:35:50 python request user-agent urllib

本文介绍了如何在 Python 3 中检索带有 User-Agent 标头的文件?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试编写一段(简单的)代码来从 Internet 下载文件.问题是，其中一些文件位于阻止默认 python User-Agent 标头的网站上.例如:

I'm trying to write a (simple) piece of code to download files off the internet. The problem is, some of these files are on websites that block the default python User-Agent headers. For example:

import urllib.request as html
html.urlretrieve('http://stackoverflow.com', 'index.html')

urllib.error.HTTPError: HTTP Error 403: Forbidden`

通常，我会在请求中设置标头，例如:

Normally, I would set the headers in the request, such as:

import urllib.request as html
request = html.Request('http://stackoverflow.com', headers={"User-Agent":"Firefox"})
response = html.urlopen(request)

然而，由于 urlretrieve 由于某种原因不能处理请求，所以这不是一个选项.

however, as urlretrieve doesn't work with requests for some reason, this isn't an option.

是否有任何简单的解决方案(不包括导入诸如请求之类的库)?我注意到 urlretrieve 是从 Python 2 发布的遗留接口的一部分，有什么我应该使用的吗?

Are there any simple-ish solutions to this (that don't include importing a library such as requests)? I've noticed that urlretrieve is part of the legacy interface posted over from Python 2, is there anything I should be using instead?

我尝试创建一个自定义 FancyURLopener 类来处理检索文件，但这导致的问题比它解决的要多，例如为 404 的链接创建空文件.

I tried creating a custom FancyURLopener class to handle retrieving files, but that caused more problems than it solved, such as creating empty files for links that 404.

如何在 Python 3 中检索带有 User-Agent 标头的文件? [英] How can I retrieve files with User-Agent headers in Python 3?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在 Python 3 中检索带有 User-Agent 标头的文件? [英] How can I retrieve files with User-Agent headers in Python 3?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭