网络抓取Yahoo!中的关键统计数据R金融 [英] Web scraping of key stats in Yahoo! Finance with R

查看:208
本文介绍了网络抓取Yahoo!中的关键统计数据R金融的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人从Yahoo!抓取数据方面经验丰富吗?带R?的财务关键统计页面?我熟悉使用rvest包中的read_htmlhtml_nodes()html_text()直接从html抓取数据.但是,此网页 MSFT关键统计有点复杂,我不确定是否所有统计信息都保存在XHR,JS或Doc中.我猜数据存储在JSON中.如果有人知道使用R提取和解析此网页数据的好方法,请回答我的问题,在此先感谢您!

Is anyone experienced in scraping data from the Yahoo! Finance key statistics page with R? I am familiar scraping data directly from html using read_html, html_nodes(), and html_text() from rvest package. However, this web page MSFT key stats is a bit complicated, I am not sure if all the stats are kept in XHR, JS, or Doc. I am guessing the data is stored in JSON. If anyone knows a good way to extract and parse data for this web page with R, kindly answer my question, great thanks in advance!

或者,如果有更便捷的方法通过quantmodQuandl提取这些指标,请告诉我,这将是一个非常好的解决方案!

Or if there is a more convenient way to extract these metrics via quantmod or Quandl, kindly let me know, that would be a extremely good solution!

推荐答案

我很早以前就放弃了使用Excel. R绝对是解决此类问题的方法.

I gave up on Excel a long time ago. R is definitely the way to go for things like this.

library(XML)

stocks <- c("AXP","BA","CAT","CSCO")

for (s in stocks) {
      url <- paste0("http://finviz.com/quote.ashx?t=", s)
      webpage <- readLines(url)
      html <- htmlTreeParse(webpage, useInternalNodes = TRUE, asText = TRUE)
      tableNodes <- getNodeSet(html, "//table")

      # ASSIGN TO STOCK NAMED DFS
      assign(s, readHTMLTable(tableNodes[[9]], 
                header= c("data1", "data2", "data3", "data4", "data5", "data6",
                          "data7", "data8", "data9", "data10", "data11", "data12")))

      # ADD COLUMN TO IDENTIFY STOCK 
      df <- get(s)
      df['stock'] <- s
      assign(s, df)
}

# COMBINE ALL STOCK DATA 
stockdatalist <- cbind(mget(stocks))
stockdata <- do.call(rbind, stockdatalist)
# MOVE STOCK ID TO FIRST COLUMN
stockdata <- stockdata[, c(ncol(stockdata), 1:ncol(stockdata)-1)]

# SAVE TO CSV
write.table(stockdata, "C:/Users/your_path_here/Desktop/MyData.csv", sep=",", 
            row.names=FALSE, col.names=FALSE)

# REMOVE TEMP OBJECTS
rm(df, stockdatalist)

这篇关于网络抓取Yahoo!中的关键统计数据R金融的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆