如果我想用 R 抓取带有参数的页面怎么办? [英] What if I want to web scrape with R for a page with parameters?
问题描述
我想在这里抓取的页面:http://stoptb.org/countries/tbteam/searchExperts.asp 需要在这个页面提交参数:http://stoptb.org/countries/tbteam/experts.asp 以获取数据.由于参数没有嵌套在 URL 中,我不知道如何用 R 传递它们.有没有办法在 R 中做到这一点?
The page I would like to scrape here: http://stoptb.org/countries/tbteam/searchExperts.asp requires the submission of parameters in this page: http://stoptb.org/countries/tbteam/experts.asp in order to get the data out. Since the parameters are not nested in the URL, I don't know how to pass them with R. Is there a way to do this in R?
(顺便说一句,我对 ASP 几乎一无所知,所以也许这就是我缺少的组件.)
(BTW, I know next to nothing about ASP, so maybe that's the component I'm missing.)
推荐答案
您可以使用 RHTMLForms
您可能需要先安装:
# install.packages("RHTMLForms", repos = "http://www.omegahat.org/R")
或者在windows下你可能需要
or under windows you may need
# install.packages("RHTMLForms", repos = "http://www.omegahat.org/R", type = "source")
require(RHTMLForms)
require(RCurl)
require(XML)
forms = getHTMLFormDescription("http://stoptb.org/countries/tbteam/experts.asp")
fun = createFunction(forms$sExperts)
# find experts with expertise in "Infection control: Engineering Consultant"
results <- fun(Expertise = "Infection control: Engineering Consultant")
tableData <- getNodeSet(htmlParse(results), "//*/table[@class = 'data']")
readHTMLTable(tableData[[1]])
# V1 V2 V3
#1 <NA> <NA>
#2 Name of Expert Country of Residence Email
#3 Girmay, Desalegn Ethiopia deskebede@yahoo.com
#4 IVANCHENKO, VARVARA Estonia v.ivanchenko81@mail.ru
#5 JAUCOT, Alex Belgium alex.jaucot@gmail.com
#6 Mulder, Hans Johannes Henricus Namibia hmulder@iway.na
#7 Walls, Neil Australia neil@nwalls.com
#8 Zuccotti, Thea Italy thea_zuc@yahoo.com
# V4
#1 <NA>
#2 Number of Missions
#3 0
#4 3
#5 0
#6 0
#7 0
#8 1
或者创建一个读取器来返回一个表
or create a reader to return a table
returnTable <- function(results){
tableData <- getNodeSet(htmlParse(results), "//*/table[@class = 'data']")
readHTMLTable(tableData[[1]])
}
fun = createFunction(forms$sExperts, reader = returnTable)
fun(CBased = "Bhutan") # find experts based in Bhutan
# V1 V2 V3
#1 <NA> <NA>
#2 Name of Expert Country of Residence Email
#3 Wangchuk, Lungten Bhutan drlungten@health.gov.bt
# V4
#1 <NA>
#2 Number of Missions
#3 2
这篇关于如果我想用 R 抓取带有参数的页面怎么办?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!