使用 rvest 和 R 进行网页抓取 [英] Web Scraping with rvest and R

查看:64
本文介绍了使用 rvest 和 R 进行网页抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图从 http://www.morningstar.com/funds/xnas/adafx/quote.html.但结果总是字符(空);我做错了什么?

I am trying to web scrape the total assets of a particular fund in this case ADAFX from http://www.morningstar.com/funds/xnas/adafx/quote.html. But the result is always charecter (empty); what am I doing wrong?

我之前使用过 rvest,结果喜忧参半,所以我想找时间从可信赖的大师社区(就是你)那里获得专家帮助.

I have used rvest before with mixed results, so I figured time to get expert help from the community of trusted gurus (thats you).

library(rvest)      
Symbol.i ="ADAFX"
url <-Paste("http://www.morningstar.com/funds/xnas/",Symbol.i,"/quote.html",sep="")
  tryCatch(NetAssets.i <- url %>%
             read_html() %>%
             html_nodes(xpath='//*[@id="gr_total_asset_wrap"]/span/span') %>%
             html_text(), error = function(e) NetAssets.i = NA)

先谢谢你,干杯,

亚伦·索德斯特罗姆

推荐答案

它是一个动态页面,通过 XHR 请求为各个部分加载数据,因此您必须查看 Developer Tools Network 选项卡以获取目标内容 URL.

It's a dynamic page that loads data for the various sectinons via XHR requests, so you have to look at the Developer Tools Network tab to get the target content URLs.

library(httr)
library(rvest)

res <- GET(url = "http://quotes.morningstar.com/fundq/c-header",
           query = list(
             t="XNAS:ADAFX",
             region="usa",
             culture="en-US",
             version="RET",
             test="QuoteiFrame"
           )
)

content(res) %>%
  html_nodes("span[vkey='TotalAssets']") %>%
  html_text() %>%
  trimws()
## [1] "20.6  mil"

这篇关于使用 rvest 和 R 进行网页抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆