R - 如何使用 rvest 或 rcurl 点击网页 [英] R - How to make a click on webpage using rvest or rcurl

查看:90
本文介绍了R - 如何使用 rvest 或 rcurl 点击网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从这个网页

可以使用 rvest 轻松抓取数据.

The data can be easily scraped with rvest.

代码可能是这样的:

library(rvest)
library(pipeR)
url <- "http://www.tradingeconomics.com/"
css <-     "#ctl00_ContentPlaceHolder1_defaultUC1_CurrencyMatrixAllCountries1_GridView1"

data <- url %>>%
  html() %>>%
  html_nodes(css) %>>%
  html_table() 

但是这样的网页存在问题.

But there is a problem for webpages like this.

有一个+按钮可以显示所有国家的数据,但默认只有50个国家的数据.

There is a + button to show the data of all the countries, but the default is just data of 50 countries.

所以如果我使用代码,我可以抓取 50 个国家的数据.

So if I use the code, I can just scrape data of 50 countries.

+ 按钮是在 javascript 中制作的,所以我想知道 R 中是否有办法点击按钮然后抓取数据.

The + button is made in javascript, so I want to know if there is a way in R to click the button and then scrape the data.

推荐答案

有时最好在 ajax 网络请求级别解决问题.对于此站点,您可以使用 Chrome 的开发工具并查看请求.为了构建表(也是整个表),它使用各种 ajax-y 参数向站点发送 POST .只需复制它,对响应进行一些数据处理,您就可以开始了:

Sometimes it's better to attack the problem at the ajax web-request level. For this site, you can use Chrome's dev tools and watch the requests. To build the table (the whole table, too) it makes a POST to the site with various ajax-y parameters. Just replicate that, do a bit of data-munging of the response and you're good to go:

library(httr)
library(rvest)
library(dplyr)

res <- POST("http://www.tradingeconomics.com/",
            encode="form",
            user_agent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.50 Safari/537.36"),
            add_headers(`Referer`="http://www.tradingeconomics.com/",
                        `X-MicrosoftAjax`="Delta=true"),
            body=list(
              `ctl00$AjaxScriptManager1$ScriptManager1`="ctl00$ContentPlaceHolder1$defaultUC1$CurrencyMatrixAllCountries1$UpdatePanel1|ctl00$ContentPlaceHolder1$defaultUC1$CurrencyMatrixAllCountries1$LinkButton1",
              `__EVENTTARGET`="ctl00$ContentPlaceHolder1$defaultUC1$CurrencyMatrixAllCountries1$LinkButton1",
              `srch-term`="",
              `ctl00$ContentPlaceHolder1$defaultUC1$CurrencyMatrixAllCountries1$GridView1$ctl01$DropDownListCountry`="top",
              `ctl00$ContentPlaceHolder1$defaultUC1$CurrencyMatrixAllCountries1$ParameterContinent`="",
              `__ASYNCPOST`="false"))


res_t <- content(res, as="text")
res_h <- paste0(unlist(strsplit(res_t, "\r\n"))[-1], sep="", collapse="\n")

css <- "#ctl00_ContentPlaceHolder1_defaultUC1_CurrencyMatrixAllCountries1_GridView1"

tab <- html(res_h) %>% 
  html_nodes(css) %>%
  html_table() 

tab[[1]]$COUNTRIESWORLDAMERICAEUROPEASIAAUSTRALIAAFRICA

glimpse(tab[[1]]

另一种选择是使用 RSelenium 转到页面,单击+",然后抓取结果表.

Another alternative would have been to use RSelenium to go to the page, click the "+" and then scrape the resultant table.

这篇关于R - 如何使用 rvest 或 rcurl 点击网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆