机械化br.submit()限制? [英] Mechanize br.submit() limitations?

查看:68
本文介绍了机械化br.submit()限制?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的意图是使用Mechanize向网站提交搜索查询,并使用BeautifulSoup分析结果.这将用于同一网站,因此可以对表单名称等进行硬编码.我的初始查询出现问题,如下所示:

My intention is to submit a search query to a website using Mechanize and to analyse the results using BeautifulSoup. This will be used for the same website and so form names etc. can be hardcoded. I was having issues with my initial query, which is shown below:


import mechanize
import urllib2
#from bs4 import BeautifulSoup


def inspect_page(url):
    br = mechanize.Browser(factory=mechanize.RobustFactory())
    br.set_handle_robots(False)
    br.addheaders = [('User-agent',
                      'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6')]
    br.set_handle_redirect(mechanize.HTTPRedirectHandler)

    try:
        br.open(url)
    except mechanize.HTTPError, e:
        print "HTTP Error", e.code,
    except urllib2.URLError as e:
        print "URL Error", e.reason,
        return

    for form in br.forms():
        print form

    br.select_form(name="dataform")
    br.form['pcode'] = 'WV14 8EW'
    br.form['premise'] = '66'
    response = br.submit()
    print response.read()

    #soup = BeautifulSoup(response.read())

inspect_page('http://www.fensa.co.uk/asp/certificate.asp')

这没有重定向到结果页面,并且print response.read()显示了我提交查询的页面的HTML,因此我假设我在代码中出错了.但是,当我测试另一个网站(inspect_page('https://publicaccess.glasgow.gov.uk/online-applications/search.do?action=simple'))并更改表格以使其与该网站上的表格匹配时:

This did not redirect to the results page and print response.read() displayed the HTML of the page I submitted the query on, so I assumed I had made an error in my code. However when I tested another site (inspect_page('https://publicaccess.glasgow.gov.uk/online-applications/search.do?action=simple')) and changed the forms to match those on the site:

`br.select_form(name="searchCriteriaForm")
br.form['searchCriteria.simpleSearchString'] = 'Queen Elizabeth Gardens'
response = br.submit()
print response.read()`    

我已按预期重定向.是否有什么可以阻止在调用br.submit()时重定向页面?我已经检查过该网站是否未压缩.

I was redirected as I expected. Is there anything that would stop a page being redirected when br.submit() is called? I've already checked that the site is not GZipped.

推荐答案

一个限制是mechanize不了解JavaScript.使用脚本在网站上提交搜索表单会触发一个JavaScript函数,该函数将在实际提交表单值之前验证输入并更改<form>action属性.

One limitation is that mechanize doesn't know about JavaScript. Submitting the search form on the site in your script triggers a JavaScript function which validates the input and changes the action attribute of the <form> before actually submitting the form values.

这是表单的HTML部分:

Here is the HTML part of the form:

<a onclick="return validate_required()" name="submit" href="#">
  <input class="button" type="button" value="Search" name="Submit2">
</a>

这是在HTML文档开头附近定义的validate_required()函数:

And this is the validate_required() function defined near the beginning of that HTML document:

function validate_required() {

    error = "";
    if (document.getElementById("pcode").value == '') { error = error + "Postcode\n"; }
    if (document.getElementById("premise").value == '') { error = error + "Premise\n"; }

    if (error != '') {
        alert("Please enter:\n\n" + error);
        return false;
    }
    else {
        document.dataform.action = "certificate_results.asp";
        document.dataform.submit();

    }
}

这篇关于机械化br.submit()限制?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆