如何从 Yahoo! 抓取关键统计信息用R理财? [英] How to scrape key statistics from Yahoo! Finance with R?
问题描述
不幸的是,我还不是经验丰富的抓取工具.但是,我需要使用 R 从 Yahoo Finance 中抓取多只股票的关键统计数据.
Unfortunately, I am not an experienced scraper yet. However, I need to scrape key statistics of multiple stocks from Yahoo Finance with R.
我有点熟悉使用 rvest 包中的 read_html、html_nodes() 和 html_text() 直接从 html 抓取数据.但是,这个网页MSFT key stats 有点复杂,我不确定所有的统计信息是否都保存在XHR、JS 或Doc 中.我猜数据存储在 JSON 中.
I am somewhat familiar with scraping data directly from html using read_html, html_nodes(), and html_text() from the rvest package. However, this web page MSFT key stats is a bit complicated, I am not sure if all the stats are kept in XHR, JS, or Doc. I am guessing the data is stored in JSON.
如果有人知道使用 R 提取和解析此网页数据的好方法,请回答我的问题,在此先感谢!
If anyone knows a good way to extract and parse data for this web page with R, kindly answer my question, great thanks in advance!
或者如果有更方便的方法通过 quantmod 或 Quandl 提取这些指标,请告诉我,这将是一个非常好的解决方案!
Or if there is a more convenient way to extract these metrics via quantmod or Quandl, kindly let me know, that would be a extremely good solution!
目标是将票证/符号作为行名/行标签,而将统计信息标识为列.可以在此 Finviz 链接中找到我的需求说明:
The goal is to have tickets/symbols as rownames/rowlabels whereas the statistics are identified as columns. A illustration of my needs can be found at this Finviz link:
https://finviz.com/screener.ashx
我想抓取 Yahoo Finance 数据的原因是因为 Yahoo 还考虑了 Enterprise、EBITDA 关键统计数据..
The reason I would like to scrape Yahoo Finance data is because Yahoo also considers Enterprise, EBITDA key stats..
我的意思是参考关键统计页面..例如..:https://finance.yahoo.com/quote/MSFT/key-statistics/ .该代码应指向一个数据框行的股票代码和关键统计数据列.
I meant to refer to the key statistics page.. For example.. : https://finance.yahoo.com/quote/MSFT/key-statistics/ . The code should lead to one data frame rows of stock symbols and columns of key stats.
推荐答案
代码
library(rvest)
library(tidyverse)
# Define stock name
stock <- "MSFT"
# Extract and transform data
df <- paste0("https://finance.yahoo.com/quote/", stock, "/financials?p=", stock) %>%
read_html() %>%
html_table() %>%
map_df(bind_cols) %>%
# Transpose
t() %>%
as_tibble()
# Set first row as column names
colnames(df) <- df[1,]
# Remove first row
df <- df[-1,]
# Add stock name column
df$Stock_Name <- stock
结果
Revenue `Total Revenue` `Cost of Revenu… `Gross Profit`
<chr> <chr> <chr> <chr>
1 6/30/2… 110,360,000 38,353,000 72,007,000
2 6/30/2… 96,571,000 33,850,000 62,721,000
3 6/30/2… 91,154,000 32,780,000 58,374,000
4 6/30/2… 93,580,000 33,038,000 60,542,000
# ... with 25 more variables: ...
<小时>
或者,为了方便起见,作为一个函数:
edit:
Or, for convenience, as a function:
get_yahoo <- function(stock){
# Extract and transform data
x <- paste0("https://finance.yahoo.com/quote/", stock, "/financials?p=", stock) %>%
read_html() %>%
html_table() %>%
map_df(bind_cols) %>%
# Transpose
t() %>%
as_tibble()
# Set first row as column names
colnames(x) <- x[1,]
# Remove first row
x <- x[-1,]
# Add stock name column
x$Stock_Name <- stock
return(x)
}
用法:get_yahoo(stock)
这篇关于如何从 Yahoo! 抓取关键统计信息用R理财?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!