如何使机械化不是失败，并在此页面上的形式？ [英] How to make mechanize not fail with forms on this page?

查看：89 发布时间：2016/7/27 21:34:12 python automation screen-scraping mechanize

本文介绍了如何使机械化不是失败，并在此页面上的形式？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

 进口机械化URL ='http://steamcommunity.comBR = mechanize.Browser（工厂= mechanize.RobustFactory（））br.open（URL）
打印br.request
打印br.form
对于每个在br.forms（）：
    每个打印
    打印

以上code的结果：

 回溯（最后最近一次调用）：
  文件./mech_test.py，12号线，上述＆lt;＆模块GT;
    对于每个在br.forms（）：
  文件建立/ bdist.linux-的i686 /蛋/机械化/ _mechanize.py，线路426，在形式
  文件建立/ bdist.linux-的i686 /蛋/机械化/ _html.py，线路559，在形式
  文件建立/ bdist.linux-的i686 /蛋/机械化/ _html.py，线路228，在形式
mechanize._html.ParseError

我的具体目标是使用登录表单，但我甚至不能机械化认识到存在任何形式的。即使使用我认为是选择的最基本的方法的任何的形式， br.select_form（NR = 0），结果在相同的回溯。表单的ENCTYPE是的multipart / form-data的，如果有差别。

我想这一切都归结到一个问题的两个部分：我怎样才能用机械化此网页的工作，或者如果它是不可能的，什么是同时保持饼干的另一种方式。

？

编辑：如下面所提到的，这个重定向到'https://steamcommunity.com'

机械化可以成功地检索HTML作为可与以下code可以看出：

  URL ='https://steamcommunity.comHH = mechanize.HTTPSHandler（）＃你可能想HTTPSHandler，太
hh.set_http_debuglevel（1）
首战= mechanize.build_opener（HH）
响应= opener.open（URL）
内容= response.readlines（）打印内容

解决方案

使用这个秘密，我敢肯定，这是对你的工作;）

  BR = mechanize.Browser（工厂= mechanize.DefaultFactory（i_want_broken_xhtml_support = TRUE））

import mechanize

url = 'http://steamcommunity.com'

br=mechanize.Browser(factory=mechanize.RobustFactory())

br.open(url)
print br.request
print br.form
for each in br.forms():
    print each
    print

The above code results in:

Traceback (most recent call last):
  File "./mech_test.py", line 12, in <module>
    for each in br.forms():
  File "build/bdist.linux-i686/egg/mechanize/_mechanize.py", line 426, in forms
  File "build/bdist.linux-i686/egg/mechanize/_html.py", line 559, in forms
  File "build/bdist.linux-i686/egg/mechanize/_html.py", line 228, in forms
mechanize._html.ParseError

My specific goal is to use the login form, but I can't even get mechanize to recognize that there are any forms. Even using what I think is the most basic method of selecting any form, br.select_form(nr=0), results in the same traceback. The form's enctype is multipart/form-data if that makes a difference.

I guess that all boils down to a two part question: How can I get mechanize to work with this page, or if it's not possible, what's another way while maintaining cookies?

edit: As mentioned below, this redirects to 'https://steamcommunity.com'.

Mechanize can successfully retrieving the HTML as can be seen with the following code:

url = 'https://steamcommunity.com'

hh = mechanize.HTTPSHandler()  # you might want HTTPSHandler, too
hh.set_http_debuglevel(1)
opener = mechanize.build_opener(hh)
response = opener.open(url)
contents = response.readlines()

print contents

解决方案

Use this secret, i'm sure this is work for you ;)

br = mechanize.Browser(factory=mechanize.DefaultFactory(i_want_broken_xhtml_support=True))

这篇关于如何使机械化不是失败，并在此页面上的形式？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使机械化不是失败，并在此页面上的形式？ [英] How to make mechanize not fail with forms on this page?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使机械化不是失败，并在此页面上的形式？ [英] How to make mechanize not fail with forms on this page?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭