从下拉列表中的选定选项中抓取响应 [英] scraping a response from a selected option in dropdown list

查看:34
本文介绍了从下拉列表中的选定选项中抓取响应的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个页面示例,其中列出了所选球员的棒球统计数据,默认为最近一年(2014 年,很快将是 2015 年)http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx?playerId=76325

This is an example of a page that lists baseball stats for a selected player, defaulting to the most recent year (2014, soon to be 2015) http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx?playerId=76325

下拉列表允许用户选择回溯到 2010 年的年份,但不会更改显示的 url.如何从下拉列表中的每个值中抓取所有可用年份?

The drop down list allows the user to selected years back to 2010, but doesn't not change the displayed url. How can I scrape all the available years, from each value in the drop down list?

我目前正在使用 Python 和 BeautifulSoup,但我愿意使用任何可以完成工作的东西.

I'm currently using Python and BeautifulSoup, but I'm willing to use whatever will get the job done.

<select name="ctl00$ctl00$cphContainer$cphContents$ddlYear"     
        onchange="javascript:setTimeout(&#39;__doPostBack(&#39;ctl00$ctl00$cphContainer$cphContents$ddlYear&#39;,&#39;&#39;)&#39;, 0)" 
        id="cphContainer_cphContents_ddlYear" 
        class="select02 mgt30">
<option value="2014">2014</option>
<option value="2013">2013</option>
<option selected="selected" value="2012">2012</option>
<option value="2011">2011</option>
<option value="2010">2010</option>

推荐答案

分两步做:

  • 发出 GET 请求,解析 HTML 并提取表单输入值
  • 使用负责年份的 ctl00$ctl00$cphContainer$cphContents$ddlYear 参数进行解析输入值的 POST 请求
  • make a GET request, parse HTML and extract the form input values
  • make a POST request parsing input values alongside with ctl00$ctl00$cphContainer$cphContents$ddlYear parameter which is responsible for the year

2013 年的实现示例(使用 requestsBeautifulSoup):

Implementation example for year 2013 (using requests and BeautifulSoup):

from bs4 import BeautifulSoup
import requests

url = 'http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx?playerId=76325'

with requests.Session() as session:
    session.headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36'}

    # parsing parameters
    response = session.get(url)
    soup = BeautifulSoup(response.content)

    data = {
        'ctl00$ctl00$cphContainer$cphContents$ddlYear': '2013',
        'ctl00$ctl00$txtSearchWord': '',
        '__EVENTTARGET': soup.find('input', {'name': '__EVENTTARGET'}).get('value', ''),
        '__EVENTARGUMENT': soup.find('input', {'name': '__EVENTARGUMENT'}).get('value', ''),
        '__LASTFOCUS': soup.find('input', {'name': '__LASTFOCUS'}).get('value', ''),
        '__VIEWSTATE': soup.find('input', {'name': '__VIEWSTATE'}).get('value', ''),
        '__VIEWSTATEGENERATOR': soup.find('input', {'name': '__VIEWSTATEGENERATOR'}).get('value', ''),
        '__EVENTVALIDATION': soup.find('input', {'name': '__EVENTVALIDATION'}).get('value', ''),
    }

    # parsing data
    response = session.post(url, data=data)

    soup = BeautifulSoup(response.content)

    for row in soup.select('table.tData01 tr'):
        print [td.text for td in row.find_all('td')]

这将打印 2013 年所有统计表的内容:

This prints the contents of all stats tables for 2013:

[u'KIA', u'16', u'0.364', u'55', u'8', u'20', u'3', u'0', u'3', u'11', u'5', u'0', u'14', u'0', u'14', u'1']
[u'LG', u'15', u'0.321', u'53', u'7', u'17', u'1', u'0', u'2', u'9', u'1', u'1', u'6', u'3', u'10', u'2']
[u'NC', u'16', u'0.237', u'59', u'5', u'14', u'2', u'0', u'2', u'10', u'2', u'0', u'3', u'0', u'17', u'2']
[u'SK', u'16', u'0.235', u'51', u'7', u'12', u'1', u'0', u'3', u'13', u'1', u'3', u'13', u'1', u'12', u'4']
[u'ub450uc0b0', u'16', u'0.368', u'57', u'16', u'21', u'2', u'1', u'4', u'21', u'2', u'1', u'12', u'0', u'13', u'2']
[u'ub86fub370', u'16', u'0.375', u'56', u'9', u'21', u'4', u'0', u'3', u'13', u'4', u'3', u'11', u'0', u'9', u'3']
[u'uc0bcuc131', u'16', u'0.226', u'62', u'8', u'14', u'5', u'0', u'3', u'10', u'0', u'0', u'8', u'1', u'15', u'1']
[u'ud55cud654', u'15', u'0.211', u'57', u'7', u'12', u'3', u'0', u'2', u'9', u'0', u'0', u'1', u'1', u'19', u'3']
...

这篇关于从下拉列表中的选定选项中抓取响应的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆