Screenscaping ASPX与Python机械化 - JavaScript表单提交 [英] Screenscaping aspx with Python Mechanize - Javascript form submission

查看：208 发布时间：2016/6/5 17:43:04 asp.net python mechanize scraperwiki

本文介绍了Screenscaping ASPX与Python机械化 - JavaScript表单提交的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想凑英国食品评级机构数据的 ASPX 的SEACH结果页面（如，G <一个href=\"http://ratings.food.gov.uk/QuickSearch.aspx?q=po30\">http://ratings.food.gov.uk/QuickSearch.aspx?q=po30 ）上使用scraperwiki机械化/ Python的（<一个href=\"http://scraperwiki.com/scrapers/food_standards_agency/\">http://scraperwiki.com/scrapers/food_standards_agency/ ），但试图跟随具有的形式是下一个页面的链接时，想出了一个问题：

I'm trying to scrape UK Food Ratings Agency data aspx seach results pages (e.,g http://ratings.food.gov.uk/QuickSearch.aspx?q=po30 ) using Mechanize/Python on scraperwiki ( http://scraperwiki.com/scrapers/food_standards_agency/ ) but coming up with a problem when trying to follow "next" page links which have the form:

<input type="submit" name="ctl00$ContentPlaceHolder1$uxResults$uxNext" value="Next >" id="ctl00_ContentPlaceHolder1_uxResults_uxNext" title="Next >" />

表单处理程序是这样的：

The form handler looks like:

<form method="post" action="QuickSearch.aspx?q=po30" onsubmit="javascript:return WebForm_OnSubmit();" onkeypress="javascript:return WebForm_FireDefaultButton(event, 'ctl00_ContentPlaceHolder1_buttonSearch')" id="aspnetForm">
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__LASTFOCUS" id="__LASTFOCUS" value="" />

这是HTTP跟踪，当我手动点击下一步链接显示为__EVENTTARGET空？所有我能找到其他铲运机婴儿床显示__EVENTTARGET操纵作为处理接下来的页面的方式。

An HTTP trace when I manually click Next links shows __EVENTTARGET as empty? All the cribs I can find on other scrapers show the manipulation of __EVENTTARGET as the way of handling Next pages.

事实上，我不知道我要永远刮页面加载如何在下一个页面？不管我扔的铲运机，它永远只能设法装入第一结果页。（即使是能够改变每页的结果数是有用的，但我看不出怎么办，要么！）

Indeed, I'm not sure how the page I want to scrape ever loads the next page? Whatever I throw at the scraper, it only ever manages to load the first results page. (Even being able to change the number of results per page would be useful, but I can't see how to do that either!)

所以 - ？如何刮1 +第N个结果页面上N> 0的任何想法

So - any ideas on how to scrape the 1+N'th results pages for N>0?

推荐答案

Mechanize会doesn't处理JavaScript，但对于这种特殊情况下，它需要isn't

Mechanize doesn´t handle javascript, but for this particular case it isn´t needed.

首先，我们用机械化打开结果页面

First we open the result page with mechanize

url = 'http://ratings.food.gov.uk/QuickSearch.aspx?q=po30'
br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
br.open(url)
response = br.response().read()

然后我们选择ASPNET形式：

Then we select the aspnet form:

br.select_form(nr=0) #Select the first (and only) form - it has no name so we reference by number

该表单5提交按钮 - 我们要提交一个把我们带到下一个结果页面：

The form has 5 submit buttons - we want to submit the one that takes us to the next result page:

response = br.submit(name='ctl00$ContentPlaceHolder1$uxResults$uxNext').read()  #"Press" the next submit button

在表单中的其他提交按钮是：

The other submit buttons in the form are:

ctl00$uxLanguageSwitch # Switch language to Welsh
ctl00$ContentPlaceHolder1$uxResults$Button1 # Search submit button
ctl00$ContentPlaceHolder1$uxResults$uxFirst # First result page
ctl00$ContentPlaceHolder1$uxResults$uxPrevious # Previous result page
ctl00$ContentPlaceHolder1$uxResults$uxLast # Last result page

在机械化，我们可以得到这样的形式信息：

In mechanize we can get form info like this:

for form in br.forms():
    print form

这篇关于Screenscaping ASPX与Python机械化 - JavaScript表单提交的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Screenscaping ASPX与Python机械化 - JavaScript表单提交 [英] Screenscaping aspx with Python Mechanize - Javascript form submission

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

Screenscaping ASPX与Python机械化 - JavaScript表单提交 [英] Screenscaping aspx with Python Mechanize - Javascript form submission

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭