尝试在Python中使用urllib2访问互联网 [英] Trying to access the Internet using urllib2 in Python

查看:23
本文介绍了尝试在Python中使用urllib2访问互联网的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个程序,该程序将(除其他外)从预定网站获取文本或源代码.我正在学习 Python 来做到这一点,大多数消息来源都告诉我使用 urllib2.作为测试,我尝试了以下代码:

I'm trying to write a program that will (among other things) get text or source code from a predetermined website. I'm learning Python to do this, and most sources have told me to use urllib2. Just as a test, I tried this code:

import urllib2
response = urllib2.urlopen('http://www.python.org')
html = response.read()

shell 只是坐在那里,就像在等待一些输入一样,而不是以任何预期的方式行事.甚至没有>>>"...".退出此状态的唯一方法是使用 [ctrl]+c.当我这样做时,我会收到一大堆错误消息,比如

Instead of acting in any expected way, the shell just sits there, like it's waiting for some input. There aren't even an ">>>" or "...". The only way to exit this state is with [ctrl]+c. When I do this, I get a whole bunch of error messages, like

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/m/mls/pkg/ix86-Linux-RHEL5/lib/python2.5/urllib2.py", line 124, in urlopen
    return _opener.open(url, data)
  File "/m/mls/pkg/ix86-Linux-RHEL5/lib/python2.5/urllib2.py", line 381, in open
    response = self._open(req, data)

我很感激任何反馈.是否有与 urllib2 不同的工具可供使用,或者您能否提供有关如何解决此问题的建议.我在工作时使用的是网络计算机,但我不完全确定 shell 是如何配置的,或者这会如何影响任何事情.

I'd appreciate any feedback. Is there a different tool than urllib2 to use, or can you give advice on how to fix this. I'm using a network computer at my work, and I'm not entirely sure how the shell is configured or how that might affect anything.

推荐答案

有 99.999% 的可能性,这是一个代理问题.Python 在检测要使用的正确 http 代理方面非常糟糕,当它找不到正确的代理时,它就会挂起并最终超时.

With 99.999% probability, it's a proxy issue. Python is incredibly bad at detecting the right http proxy to use, and when it cannot find the right one, it just hangs and eventually times out.

所以首先你必须找出应该使用哪个代理,检查浏览器的选项(工具 -> Internet 选项 -> 连接 -> LAN 设置...在 IE 中,等等).如果它使用脚本进行自动配置,则您必须获取脚本(应该是某种 javascript)并找出您的请求应该去哪里.如果没有指定脚本并且勾选了自动确定"选项,您不妨问问贵公司的一些 IT 人员.

So first you have to find out which proxy should be used, check the options of your browser (Tools -> Internet Options -> Connections -> LAN Setup... in IE, etc). If it's using a script to autoconfigure, you'll have to fetch the script (which should be some sort of javascript) and find out where your request is supposed to go. If there is no script specified and the "automatically determine" option is ticked, you might as well just ask some IT guy at your company.

我假设您使用的是 Python 2.x.来自 urllib 上的 Python 文档:

I assume you're using Python 2.x. From the Python docs on urllib :

# Use http://www.someproxy.com:3128 for http proxying
proxies = {'http': 'http://www.someproxy.com:3128'}
filehandle = urllib.urlopen(some_url, proxies=proxies)

请注意,ProxyHandler 确定默认值的重点是您使用 urlopen 时已经发生的事情,因此它可能不会起作用.

Note that the point on ProxyHandler figuring out default values is what happens already when you use urlopen, so it's probably not going to work.

如果你真的想要 urllib2,你必须指定一个 ProxyHandler,就像 此页面.可能需要也可能不需要身份验证(通常不需要).

If you really want urllib2, you'll have to specify a ProxyHandler, like the example in this page. Authentication might or might not be required (usually it's not).

这篇关于尝试在Python中使用urllib2访问互联网的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆