如何从 Yahoo! 抓取关键统计信息用R理财? [英] How to scrape key statistics from Yahoo! Finance with R?

查看:50
本文介绍了如何从 Yahoo! 抓取关键统计信息用R理财?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

不幸的是,我还不是经验丰富的抓取工具.但是,我需要使用 R 从 Yahoo Finance 中抓取多只股票的关键统计数据.

Unfortunately, I am not an experienced scraper yet. However, I need to scrape key statistics of multiple stocks from Yahoo Finance with R.

我有点熟悉使用 rvest 包中的 read_html、html_nodes() 和 html_text() 直接从 html 抓取数据.但是,这个网页MSFT key stats 有点复杂,我不确定所有的统计信息是否都保存在XHR、JS 或Doc 中.我猜数据存储在 JSON 中.

I am somewhat familiar with scraping data directly from html using read_html, html_nodes(), and html_text() from the rvest package. However, this web page MSFT key stats is a bit complicated, I am not sure if all the stats are kept in XHR, JS, or Doc. I am guessing the data is stored in JSON.

如果有人知道使用 R 提取和解析此网页数据的好方法,请回答我的问题,在此先感谢!

If anyone knows a good way to extract and parse data for this web page with R, kindly answer my question, great thanks in advance!

或者如果有更方便的方法通过 quantmod 或 Quandl 提取这些指标,请告诉我,这将是一个非常好的解决方案!

Or if there is a more convenient way to extract these metrics via quantmod or Quandl, kindly let me know, that would be a extremely good solution!

目标是将票证/符号作为行名/行标签,而将统计信息标识为列.可以在此 Finviz 链接中找到我的需求说明:

The goal is to have tickets/symbols as rownames/rowlabels whereas the statistics are identified as columns. A illustration of my needs can be found at this Finviz link:

https://finviz.com/screener.ashx

我想抓取 Yahoo Finance 数据的原因是因为 Yahoo 还考虑了 Enterprise、EBITDA 关键统计数据..

The reason I would like to scrape Yahoo Finance data is because Yahoo also considers Enterprise, EBITDA key stats..

我的意思是参考关键统计页面..例如..:https://finance.yahoo.com/quote/MSFT/key-statistics/ .该代码应指向一个数据框行的股票代码和关键统计数据列.

I meant to refer to the key statistics page.. For example.. : https://finance.yahoo.com/quote/MSFT/key-statistics/ . The code should lead to one data frame rows of stock symbols and columns of key stats.

推荐答案

代码

library(rvest)
library(tidyverse)

# Define stock name
stock <- "MSFT"

# Extract and transform data
df <- paste0("https://finance.yahoo.com/quote/", stock, "/financials?p=", stock) %>% 
    read_html() %>% 
    html_table() %>% 
    map_df(bind_cols) %>% 
    # Transpose
    t() %>%
    as_tibble()

# Set first row as column names
colnames(df) <- df[1,]
# Remove first row
df <- df[-1,]
# Add stock name column
df$Stock_Name <- stock

结果

  Revenue `Total Revenue` `Cost of Revenu… `Gross Profit`
  <chr>   <chr>           <chr>            <chr>         
1 6/30/2… 110,360,000     38,353,000       72,007,000    
2 6/30/2… 96,571,000      33,850,000       62,721,000    
3 6/30/2… 91,154,000      32,780,000       58,374,000    
4 6/30/2… 93,580,000      33,038,000       60,542,000    
# ... with 25 more variables: ...

<小时>


或者,为了方便起见,作为一个函数:


edit:
Or, for convenience, as a function:

get_yahoo <- function(stock){
  # Extract and transform data
  x <- paste0("https://finance.yahoo.com/quote/", stock, "/financials?p=", stock) %>% 
    read_html() %>% 
    html_table() %>% 
    map_df(bind_cols) %>% 
    # Transpose
    t() %>%
    as_tibble()

  # Set first row as column names
  colnames(x) <- x[1,]
  # Remove first row
  x <- x[-1,]
  # Add stock name column
  x$Stock_Name <- stock

  return(x)
}

用法:get_yahoo(stock)

这篇关于如何从 Yahoo! 抓取关键统计信息用R理财?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆