如果我想用 R 抓取带有参数的页面怎么办? [英] What if I want to web scrape with R for a page with parameters?

查看:21
本文介绍了如果我想用 R 抓取带有参数的页面怎么办?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在这里抓取的页面:http://stoptb.org/countries/tbteam/searchExperts.asp 需要在这个页面提交参数:http://stoptb.org/countries/tbteam/experts.asp 以获取数据.由于参数没有嵌套在 URL 中,我不知道如何用 R 传递它们.有没有办法在 R 中做到这一点?

The page I would like to scrape here: http://stoptb.org/countries/tbteam/searchExperts.asp requires the submission of parameters in this page: http://stoptb.org/countries/tbteam/experts.asp in order to get the data out. Since the parameters are not nested in the URL, I don't know how to pass them with R. Is there a way to do this in R?

(顺便说一句,我对 ASP 几乎一无所知,所以也许这就是我缺少的组件.)

(BTW, I know next to nothing about ASP, so maybe that's the component I'm missing.)

推荐答案

您可以使用 RHTMLForms

您可能需要先安装:

# install.packages("RHTMLForms", repos = "http://www.omegahat.org/R")

或者在windows下你可能需要

or under windows you may need

# install.packages("RHTMLForms", repos = "http://www.omegahat.org/R", type = "source")


 require(RHTMLForms)
 require(RCurl)
 require(XML)
 forms = getHTMLFormDescription("http://stoptb.org/countries/tbteam/experts.asp")
 fun = createFunction(forms$sExperts)
 # find experts with expertise in "Infection control: Engineering Consultant"
 results <- fun(Expertise = "Infection control: Engineering Consultant")

 tableData <- getNodeSet(htmlParse(results), "//*/table[@class = 'data']")
 readHTMLTable(tableData[[1]])

#                              V1                   V2                     V3
#1                                                <NA>                   <NA>
#2                 Name of Expert Country of Residence                  Email
#3               Girmay, Desalegn             Ethiopia    deskebede@yahoo.com
#4            IVANCHENKO, VARVARA              Estonia v.ivanchenko81@mail.ru
#5                   JAUCOT, Alex              Belgium  alex.jaucot@gmail.com
#6 Mulder, Hans Johannes Henricus              Namibia        hmulder@iway.na
#7                    Walls, Neil            Australia        neil@nwalls.com
#8                 Zuccotti, Thea                Italy     thea_zuc@yahoo.com
#                  V4
#1               <NA>
#2 Number of Missions
#3                  0
#4                  3
#5                  0
#6                  0
#7                  0
#8                  1

或者创建一个读取器来返回一个表

or create a reader to return a table

 returnTable <- function(results){
  tableData <- getNodeSet(htmlParse(results), "//*/table[@class = 'data']")
  readHTMLTable(tableData[[1]])
 }
 fun = createFunction(forms$sExperts, reader = returnTable)
 fun(CBased = "Bhutan") # find experts based in Bhutan
#                 V1                   V2                      V3
#1                                   <NA>                    <NA>
#2    Name of Expert Country of Residence                   Email
#3 Wangchuk, Lungten               Bhutan drlungten@health.gov.bt
#                  V4
#1               <NA>
#2 Number of Missions
#3                  2

这篇关于如果我想用 R 抓取带有参数的页面怎么办?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆