R网络跨越多个页面 [英] R web scraping across multiple pages

查看：131 发布时间：2018/6/14 19:20:27 html r web-scraping rvest

本文介绍了R网络跨越多个页面的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在制作一个网络抓取计划，以搜索特定的葡萄酒并返回该品种的当地葡萄酒清单。我遇到的问题是多个页面结果。下面的代码是我正在使用的一个基本示例

  url2 < - http：//www.winemag。 com /？s = washington + merlot& search_type = reviews
 htmlpage2<  -  read_html（url2）
 names2<  -  html_nodes（htmlpage2，.review-listing .title）
 Wines2<  -  html_text（names2）

对于此特定搜索，共有39页结果。我知道URL更改为 http：//www.winemag。 com /？s = washington％20merlot& drink_type = wine& page = 2 ，但是有没有简单的方法让代码遍历所有返回的页面并将所有39页的结果编译成单个列表？我知道我可以手动做所有的网址，但这似乎是矫枉过正。 以及如果你想所有的信息作为 data.frame ：

  library（rvest）
 library（purrr）
 
 url_base<  - http://www.winemag.com/?s=washington美乐& drink_type =红酒& page =％d 
 
 map_df（1:39，函数（i）{
 
＃简单但有效的进度指示器
 cat（。）
 
 pg<  -  read_html（sprintf（url_base，i））
 
 data.frame（wine = html_text（html_nodes（pg，.review-listing .title）），
摘录= html_text（html_nodes（pg，div.excerpt）），
 rating = gsub（Points，，html_text（html_nodes（pg，span.rating））），
 appellation = html_text（html_nodes（pg，span.appellation）），
 price = gsub（\\ $，，html_text（html_nodes（pg，span.price）））， 
 stringsAsFactors = FALSE）
 
}） - >葡萄酒
 
 dplyr ::瞥见（葡萄酒）
 ##观察结果：1,170 
 ##变量：5 
 ## $酒（chr）Charles Smith 2012 Royal城市西拉（哥伦比亚谷（华盛顿州）... 
 ## $摘录（chr）绿橄榄，绿茎和新鲜草本香气在... 
 ## $评级（chr） 96，95，94，93，93，93，93，93，93，93... 
 ## $ appellation 哥伦比亚谷，哥伦比亚谷，哥伦比亚谷，... 
 ## $ price（chr）140，70，70，20，70 ，40，135，50，60，3 ...

I am working on a web scraping program to search for specific wines and return a list of local wines of that variety. The problem I am having is multiple page results. The code below is a basic example of what I am working with

url2 <- "http://www.winemag.com/?s=washington+merlot&search_type=reviews"
htmlpage2 <- read_html(url2)
names2 <- html_nodes(htmlpage2, ".review-listing .title")
Wines2 <- html_text(names2)

For this specific search there are 39 pages of results. I know the url changes to http://www.winemag.com/?s=washington%20merlot&drink_type=wine&page=2, but is there an easy way to make the code loop through all the returned pages and compile the results from all 39 pages into a single list? I know I can manually do all the urls, but that seems like overkill.

解决方案

You can do something similar with purrr::map_df() as well if you want all the info as a data.frame:

library(rvest)
library(purrr)

url_base <- "http://www.winemag.com/?s=washington merlot&drink_type=wine&page=%d"

map_df(1:39, function(i) {

  # simple but effective progress indicator
  cat(".")

  pg <- read_html(sprintf(url_base, i))

  data.frame(wine=html_text(html_nodes(pg, ".review-listing .title")),
             excerpt=html_text(html_nodes(pg, "div.excerpt")),
             rating=gsub(" Points", "", html_text(html_nodes(pg, "span.rating"))),
             appellation=html_text(html_nodes(pg, "span.appellation")),
             price=gsub("\\$", "", html_text(html_nodes(pg, "span.price"))),
             stringsAsFactors=FALSE)

}) -> wines

dplyr::glimpse(wines)
## Observations: 1,170
## Variables: 5
## $ wine        (chr) "Charles Smith 2012 Royal City Syrah (Columbia Valley (WA)...
## $ excerpt     (chr) "Green olive, green stem and fresh herb aromas are at the ...
## $ rating      (chr) "96", "95", "94", "93", "93", "93", "93", "93", "93", "93"...
## $ appellation (chr) "Columbia Valley", "Columbia Valley", "Columbia Valley", "...
## $ price       (chr) "140", "70", "70", "20", "70", "40", "135", "50", "60", "3...

这篇关于R网络跨越多个页面的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R网络跨越多个页面 [英] R web scraping across multiple pages

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

R网络跨越多个页面 [英] R web scraping across multiple pages

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭