更改 urllib2.urlopen 上的用户代理 [英] Changing user agent on urllib2.urlopen
问题描述
如何使用 urllib2.urlopen 上的默认用户代理以外的用户代理下载网页?
How can I download a webpage with a user agent other than the default one on urllib2.urlopen?
推荐答案
小故事:您可以使用 Request.add_header 这样做.
The short story: You can use Request.add_header to do this.
您也可以在创建请求本身时将标头作为字典传递,正如文档说明:
You can also pass the headers as a dictionary when creating the Request itself, as the docs note:
headers 应该是一个字典,并且将被视为以每个键和值作为参数调用 add_header()
.这通常用于欺骗"User-Agent
标头,浏览器使用该标头来标识自己——一些 HTTP 服务器只允许来自普通浏览器的请求,而不是脚本.例如,Mozilla Firefox 可能会将自己标识为 "Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"
,而 urllib2
的默认值用户代理字符串是 "Python-urllib/2.6"
(在 Python 2.6 上).
headers should be a dictionary, and will be treated as if
add_header()
was called with each key and value as arguments. This is often used to "spoof" theUser-Agent
header, which is used by a browser to identify itself – some HTTP servers only allow requests coming from common browsers as opposed to scripts. For example, Mozilla Firefox may identify itself as"Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"
, whileurllib2
‘s default user agent string is"Python-urllib/2.6"
(on Python 2.6).
这篇关于更改 urllib2.urlopen 上的用户代理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!