从下拉列表中选择的选项刮取响应 [英] scraping a response from a selected option in dropdown list

查看:246
本文介绍了从下拉列表中选择的选项刮取响应的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个列出所选玩家的棒球统计信息页面的示例,默认为最近一年(2014年即将到2015)
http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx?playerId=76325

This is an example of a page that lists baseball stats for a selected player, defaulting to the most recent year (2014, soon to be 2015) http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx?playerId=76325

下拉列表允许用户选择年份到2010年,但不会更改显示的网址。我可以从下拉列表中的每个值中删除所有可用的年份吗?

The drop down list allows the user to selected years back to 2010, but doesn't not change the displayed url. How can I scrape all the available years, from each value in the drop down list?

我目前正在使用Python和BeautifulSoup,但我愿意使用任何将完成这项工作。

I'm currently using Python and BeautifulSoup, but I'm willing to use whatever will get the job done.

<select name="ctl00$ctl00$cphContainer$cphContents$ddlYear"     
        onchange="javascript:setTimeout(&#39;__doPostBack(\&#39;ctl00$ctl00$cphContainer$cphContents$ddlYear\&#39;,\&#39;\&#39;)&#39;, 0)" 
        id="cphContainer_cphContents_ddlYear" 
        class="select02 mgt30">
<option value="2014">2014</option>
<option value="2013">2013</option>
<option selected="selected" value="2012">2012</option>
<option value="2011">2011</option>
<option value="2010">2010</option>


推荐答案

分两步执行:


  • 进行GET请求,解析HTML并提取表单输入值

  • 使POST请求解析输入值 ctl00 $ ctl00 $ cphContainer $ cphContents $ ddlYear 该年度负责的参数

  • make a GET request, parse HTML and extract the form input values
  • make a POST request parsing input values alongside with ctl00$ctl00$cphContainer$cphContents$ddlYear parameter which is responsible for the year

2013年的实施示例(使用请求 BeautifulSoup ):

Implementation example for year 2013 (using requests and BeautifulSoup):

from bs4 import BeautifulSoup
import requests

url = 'http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx?playerId=76325'

with requests.Session() as session:
    session.headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36'}

    # parsing parameters
    response = session.get(url)
    soup = BeautifulSoup(response.content)

    data = {
        'ctl00$ctl00$cphContainer$cphContents$ddlYear': '2013',
        'ctl00$ctl00$txtSearchWord': '',
        '__EVENTTARGET': soup.find('input', {'name': '__EVENTTARGET'}).get('value', ''),
        '__EVENTARGUMENT': soup.find('input', {'name': '__EVENTARGUMENT'}).get('value', ''),
        '__LASTFOCUS': soup.find('input', {'name': '__LASTFOCUS'}).get('value', ''),
        '__VIEWSTATE': soup.find('input', {'name': '__VIEWSTATE'}).get('value', ''),
        '__VIEWSTATEGENERATOR': soup.find('input', {'name': '__VIEWSTATEGENERATOR'}).get('value', ''),
        '__EVENTVALIDATION': soup.find('input', {'name': '__EVENTVALIDATION'}).get('value', ''),
    }

    # parsing data
    response = session.post(url, data=data)

    soup = BeautifulSoup(response.content)

    for row in soup.select('table.tData01 tr'):
        print [td.text for td in row.find_all('td')]

这将打印2013年所有统计资料表的内容:

This prints the contents of all stats tables for 2013:

[u'KIA', u'16', u'0.364', u'55', u'8', u'20', u'3', u'0', u'3', u'11', u'5', u'0', u'14', u'0', u'14', u'1']
[u'LG', u'15', u'0.321', u'53', u'7', u'17', u'1', u'0', u'2', u'9', u'1', u'1', u'6', u'3', u'10', u'2']
[u'NC', u'16', u'0.237', u'59', u'5', u'14', u'2', u'0', u'2', u'10', u'2', u'0', u'3', u'0', u'17', u'2']
[u'SK', u'16', u'0.235', u'51', u'7', u'12', u'1', u'0', u'3', u'13', u'1', u'3', u'13', u'1', u'12', u'4']
[u'\ub450\uc0b0', u'16', u'0.368', u'57', u'16', u'21', u'2', u'1', u'4', u'21', u'2', u'1', u'12', u'0', u'13', u'2']
[u'\ub86f\ub370', u'16', u'0.375', u'56', u'9', u'21', u'4', u'0', u'3', u'13', u'4', u'3', u'11', u'0', u'9', u'3']
[u'\uc0bc\uc131', u'16', u'0.226', u'62', u'8', u'14', u'5', u'0', u'3', u'10', u'0', u'0', u'8', u'1', u'15', u'1']
[u'\ud55c\ud654', u'15', u'0.211', u'57', u'7', u'12', u'3', u'0', u'2', u'9', u'0', u'0', u'1', u'1', u'19', u'3']
...

这篇关于从下拉列表中选择的选项刮取响应的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆