从下拉列表中选择的选项刮取响应 [英] scraping a response from a selected option in dropdown list
问题描述
这是一个列出所选玩家的棒球统计信息页面的示例,默认为最近一年(2014年即将到2015)
http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx?playerId=76325
This is an example of a page that lists baseball stats for a selected player, defaulting to the most recent year (2014, soon to be 2015) http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx?playerId=76325
下拉列表允许用户选择年份到2010年,但不会更改显示的网址。我可以从下拉列表中的每个值中删除所有可用的年份吗?
The drop down list allows the user to selected years back to 2010, but doesn't not change the displayed url. How can I scrape all the available years, from each value in the drop down list?
我目前正在使用Python和BeautifulSoup,但我愿意使用任何将完成这项工作。
I'm currently using Python and BeautifulSoup, but I'm willing to use whatever will get the job done.
<select name="ctl00$ctl00$cphContainer$cphContents$ddlYear"
onchange="javascript:setTimeout('__doPostBack(\'ctl00$ctl00$cphContainer$cphContents$ddlYear\',\'\')', 0)"
id="cphContainer_cphContents_ddlYear"
class="select02 mgt30">
<option value="2014">2014</option>
<option value="2013">2013</option>
<option selected="selected" value="2012">2012</option>
<option value="2011">2011</option>
<option value="2010">2010</option>
推荐答案
分两步执行:
- 进行GET请求,解析HTML并提取表单输入值
- 使POST请求解析输入值
ctl00 $ ctl00 $ cphContainer $ cphContents $ ddlYear
该年度负责的参数
- make a GET request, parse HTML and extract the form input values
- make a POST request parsing input values alongside with
ctl00$ctl00$cphContainer$cphContents$ddlYear
parameter which is responsible for the year
2013年的实施示例(使用请求
和 BeautifulSoup
):
Implementation example for year 2013 (using requests
and BeautifulSoup
):
from bs4 import BeautifulSoup
import requests
url = 'http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx?playerId=76325'
with requests.Session() as session:
session.headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36'}
# parsing parameters
response = session.get(url)
soup = BeautifulSoup(response.content)
data = {
'ctl00$ctl00$cphContainer$cphContents$ddlYear': '2013',
'ctl00$ctl00$txtSearchWord': '',
'__EVENTTARGET': soup.find('input', {'name': '__EVENTTARGET'}).get('value', ''),
'__EVENTARGUMENT': soup.find('input', {'name': '__EVENTARGUMENT'}).get('value', ''),
'__LASTFOCUS': soup.find('input', {'name': '__LASTFOCUS'}).get('value', ''),
'__VIEWSTATE': soup.find('input', {'name': '__VIEWSTATE'}).get('value', ''),
'__VIEWSTATEGENERATOR': soup.find('input', {'name': '__VIEWSTATEGENERATOR'}).get('value', ''),
'__EVENTVALIDATION': soup.find('input', {'name': '__EVENTVALIDATION'}).get('value', ''),
}
# parsing data
response = session.post(url, data=data)
soup = BeautifulSoup(response.content)
for row in soup.select('table.tData01 tr'):
print [td.text for td in row.find_all('td')]
这将打印2013年所有统计资料表的内容:
This prints the contents of all stats tables for 2013:
[u'KIA', u'16', u'0.364', u'55', u'8', u'20', u'3', u'0', u'3', u'11', u'5', u'0', u'14', u'0', u'14', u'1']
[u'LG', u'15', u'0.321', u'53', u'7', u'17', u'1', u'0', u'2', u'9', u'1', u'1', u'6', u'3', u'10', u'2']
[u'NC', u'16', u'0.237', u'59', u'5', u'14', u'2', u'0', u'2', u'10', u'2', u'0', u'3', u'0', u'17', u'2']
[u'SK', u'16', u'0.235', u'51', u'7', u'12', u'1', u'0', u'3', u'13', u'1', u'3', u'13', u'1', u'12', u'4']
[u'\ub450\uc0b0', u'16', u'0.368', u'57', u'16', u'21', u'2', u'1', u'4', u'21', u'2', u'1', u'12', u'0', u'13', u'2']
[u'\ub86f\ub370', u'16', u'0.375', u'56', u'9', u'21', u'4', u'0', u'3', u'13', u'4', u'3', u'11', u'0', u'9', u'3']
[u'\uc0bc\uc131', u'16', u'0.226', u'62', u'8', u'14', u'5', u'0', u'3', u'10', u'0', u'0', u'8', u'1', u'15', u'1']
[u'\ud55c\ud654', u'15', u'0.211', u'57', u'7', u'12', u'3', u'0', u'2', u'9', u'0', u'0', u'1', u'1', u'19', u'3']
...
这篇关于从下拉列表中选择的选项刮取响应的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!