用 rvest 进行网页抓取 [英] web scrape with rvest

查看:54
本文介绍了用 rvest 进行网页抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 r 包 rvest 中的 read_html 获取数据表.

I'm trying to grab a table of data using read_html from the r package rvest.

我已经尝试了以下代码:

I've tried the below code:

library(rvest)
  raw <- read_html("https://demanda.ree.es/movil/peninsula/demanda/tablas/2016-01-02/2")

我不相信上面从表中提取数据,因为我看到原始"是 2 的列表:

I don't believe the above pulled the data from the table, since I see 'raw' is a list of 2:

'node:<externalptr>' and  'doc:<externalptr>'

我也试过抓取 xpath:

I've tried grabbing the xpath too:

html_nodes(raw,xpath = '//*[(@id = "tabla_generacion")]//*[contains(concat( " ", @class, " " ), concat( " ", "ng-scope", " " ))]')

关于下一步尝试什么有什么建议吗?

Any advice on what to try next?

谢谢.

推荐答案

本网站使用 angular 调用获取数据.您可以使用该调用来获取原始 JSON.响应不是纯 JSON,因此您不能只运行 fromJSON(url),您必须在解析数据之前下载数据并去除非 JSON 内容.

This website is using angular to make a call to get the data. You can just use that call to get the raw JSON. The response is not pure JSON, so you can't just run fromJSON(url), you have to download the data and get rid of the non-JSON stuff before you parse it.

library(jsonlite)
library(httr)
url <- "https://demanda.ree.es/WSvisionaMovilesPeninsulaRest/resources/demandaGeneracionPeninsula?callback=angular.callbacks._2&curva=DEMANDA&fecha=2016-01-02"
a <- GET(url)
a <- content(a, as="text")
# get rid of the non-JSON stuff...
a <- gsub("^angular.callbacks._2\\(", "", a)
a <- gsub("\\);$", "", a)
df <- fromJSON(a, simplifyDataFrame = TRUE)

我是通过在 Chrome 中按 F12 并查看来源"选项卡来发现这一点的.填充表格的数据必须来自某个地方......所以这只是弄清楚在哪里的问题.我无法使用 rvest 刮桌子.我不确定获取数据的调用是否在 R 中执行,因为它在 chrome 中执行......所以可能没有数据可供 rvest 抓取.

I found this by pushing F12 in Chrome and looking at the "Sources" tab. The data to fill the table had to come from somewhere... so it's just a matter of figuring out where. I was unable to use rvest to scrape the table. I'm not sure if that call to get the data was executed in R as it was in chrome... so there may have been no data for rvest to scrape.

这篇关于用 rvest 进行网页抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆