Python Mechanize无法打开这些网站 [英] Python Mechanize won't open these sites

查看:71
本文介绍了Python Mechanize无法打开这些网站的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Python的Mechanize模块.我遇到了3个不同的站点,这些站点无法通过直接机械化来打开:

I'm working with Python's Mechanize module. I've come across 3 different sites that cannot be opened by mechanize directly:

  1. en.wikipedia.org/wiki/Dog(新用户,发布的链接不得超过2个)
  2. http://www.cpsc.gov/cpscpub/prerel /prhtml03/03059.html
import mechanize
br = mechanize.Browser()
br.set_handle_robots(False)

添加以下代码可以使机械化打开并解析Wikipedia文章和google搜索结果:

Adding the following code allows mechanize to open and parse the wikipedia article and the google search results:

    br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] 

但是,我的解决方法与CPSC.gov网站不匹配-当我尝试使用机械化浏览器打开它时,我的python死机了-甚至无法用键盘中断它.

But, my workarounds are no match for the CPSC.gov website - when I try to open it with the mechanize Browser, my python freezes - to the point where I can't even Keyboard Interrupt it.

这是怎么回事?

推荐答案

对于cpsc.gov网站而言,好像有一个刷新标头.但是,您可以通过以下方法解决该问题:

In the case of the cpsc.gov site, it looks like there's a refresh header that isn't being correctly processed by mechanize HTTPRefreshProcessor. However, you can workaround the problem as follows:

import mechanize

url = 'http://www.cpsc.gov/cpscpub/prerel/prhtml03/03059.html'
br = mechanize.Browser()
br.set_handle_refresh(False)
br.open(url)

这篇关于Python Mechanize无法打开这些网站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆