机械化br.submit()限制? [英] Mechanize br.submit() limitations?
问题描述
我的意图是使用Mechanize向网站提交搜索查询,并使用BeautifulSoup分析结果.这将用于同一网站,因此可以对表单名称等进行硬编码.我的初始查询出现问题,如下所示:
My intention is to submit a search query to a website using Mechanize and to analyse the results using BeautifulSoup. This will be used for the same website and so form names etc. can be hardcoded. I was having issues with my initial query, which is shown below:
import mechanize
import urllib2
#from bs4 import BeautifulSoup
def inspect_page(url):
br = mechanize.Browser(factory=mechanize.RobustFactory())
br.set_handle_robots(False)
br.addheaders = [('User-agent',
'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6')]
br.set_handle_redirect(mechanize.HTTPRedirectHandler)
try:
br.open(url)
except mechanize.HTTPError, e:
print "HTTP Error", e.code,
except urllib2.URLError as e:
print "URL Error", e.reason,
return
for form in br.forms():
print form
br.select_form(name="dataform")
br.form['pcode'] = 'WV14 8EW'
br.form['premise'] = '66'
response = br.submit()
print response.read()
#soup = BeautifulSoup(response.read())
inspect_page('http://www.fensa.co.uk/asp/certificate.asp')
这没有重定向到结果页面,并且print response.read()
显示了我提交查询的页面的HTML,因此我假设我在代码中出错了.但是,当我测试另一个网站(inspect_page('https://publicaccess.glasgow.gov.uk/online-applications/search.do?action=simple')
)并更改表格以使其与该网站上的表格匹配时:
This did not redirect to the results page and print response.read()
displayed the HTML of the page I submitted the query on, so I assumed I had made an error in my code. However when I tested another site (inspect_page('https://publicaccess.glasgow.gov.uk/online-applications/search.do?action=simple')
) and changed the forms to match those on the site:
`br.select_form(name="searchCriteriaForm")
br.form['searchCriteria.simpleSearchString'] = 'Queen Elizabeth Gardens'
response = br.submit()
print response.read()`
我已按预期重定向.是否有什么可以阻止在调用br.submit()
时重定向页面?我已经检查过该网站是否未压缩.
I was redirected as I expected. Is there anything that would stop a page being redirected when br.submit()
is called? I've already checked that the site is not GZipped.
推荐答案
一个限制是mechanize
不了解JavaScript.使用脚本在网站上提交搜索表单会触发一个JavaScript函数,该函数将在实际提交表单值之前验证输入并更改<form>
的action
属性.
One limitation is that mechanize
doesn't know about JavaScript. Submitting the search form on the site in your script triggers a JavaScript function which validates the input and changes the action
attribute of the <form>
before actually submitting the form values.
这是表单的HTML部分:
Here is the HTML part of the form:
<a onclick="return validate_required()" name="submit" href="#">
<input class="button" type="button" value="Search" name="Submit2">
</a>
这是在HTML文档开头附近定义的validate_required()
函数:
And this is the validate_required()
function defined near the beginning of that HTML document:
function validate_required() {
error = "";
if (document.getElementById("pcode").value == '') { error = error + "Postcode\n"; }
if (document.getElementById("premise").value == '') { error = error + "Premise\n"; }
if (error != '') {
alert("Please enter:\n\n" + error);
return false;
}
else {
document.dataform.action = "certificate_results.asp";
document.dataform.submit();
}
}
这篇关于机械化br.submit()限制?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!