使用 R 和 rvest 抓取财务数据 [英] Scraping financial data with R and rvest

查看:57
本文介绍了使用 R 和 rvest 抓取财务数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从 Morningstar.com 获取财务数据;我想获得 即MSFT 年度收入数据.
它们位于

表的一行中.
我遵循了一些获取主表的样本:

url <-http://financials.morningstar.com/income-statement/is.html?t=MSFT&region=usa&culture=en-US"表 <- url %>%read_html() %>%html_nodes(xpath='//*[@id="sfcontent"]/div[3]/div[3]') %>%html_table()

但我得到一个空的list().html_nodes 本身返回一个我不知道如何处理的 {xml_nodeset (0)}.

解决方案

read.csv("http://financials.morningstar.com/ajax/ReportProcess4CSV.html?&t=XNAS:MSFT&region=usa&culture=en-US&cur=&reportType=is&period=12&dataType=A&order=asc&columnYear=5&curYearPart=1st5year&rounding=3&view=raw&r=865827&denominatorView=raw&number=3", 跳过=1)Fiscal.year.ends.in.June..USD.in.millions.except.per.share.data.X2011.06 X2012.06 X2013.06 X2014.06 X2015.06 TTM1 收入 69943.00 73723.00 77849.00 86833.00 93580.00 90758.002 收入成本 15577.00 17530.00 20249.00 26934.00 33038.00 31972.003 毛利 54366.00 56193.00 57600.00 59899.00 60542.00 58786.004 运营费用 NA NA NA NA NA NA5 研发 9043.00 9811.00 10411.00 11381.00 12046.00 11943.006 销售、一般和行政 18162.00 18426.00 20425.00 20632.00 20324.00 19862.007 重组、并购 NA NA NA 127.00 NA NA8 其他运营费用 NA 6193.00 NA NA 10011.00 8871.009 营业费用总额 27205.00 34430.00 30836.00 32140.00 42381.00 40676.0010 营业收入 27161.00 21763.00 26764.00 27759.00 18161.00 18110.0011 利息支出 295.00 380.00 429.00 597.00 781.00 869.0012 其他收入(费用) 1205.00 884.00 717.00 658.00 1127.00 883.0013 税前收入 28071.00 22267.00 27052.00 27820.00 18507.00 18124.0014 所得税准备金 4921.00 5289.00 5189.00 5746.00 6314.00 5851.0015 持续经营净收入 23150.00 16978.00 21863.00 22074.00 12193.00 12273.0016 净利润 23150.00 16978.00 21863.00 22074.00 12193.00 12273.0017 普通股股东可获得的净利润 23150.00 16978.00 21863.00 22074.00 12193.00 12273.0018 每股收益 NA NA NA NA NA NA19 基本 2.73 2.02 2.61 2.66 1.49 1.5120 稀释 2.69 2.00 2.58 2.63 1.48 1.5021 加权平均流通股 NA NA NA NA NA NA22 基础 8490.00 8396.00 8375.00 8299.00 8177.00 8114.0023 稀释 8593.00 8506.00 8470.00 8399.00 8254.00 8183.0024 EBITDA 31132.00 25614.00 31236.00 33629.00 25245.00 24983.00

将浏览器开发者工具的网络"标签设为您的 BFF 非常有帮助.

(该 URL 来自检查导出"按钮的作用).

I am trying to get financial data from morningstar.com; I want to get i.e. MSFT yearly revenue data.
They are in a row <div>of a main <div> table.
I followed some samples to get the main table:

url <- "http://financials.morningstar.com/income-statement/is.html?t=MSFT&region=usa&culture=en-US"
table <- url %>%
 read_html() %>%
 html_nodes(xpath='//*[@id="sfcontent"]/div[3]/div[3]') %>%
 html_table()

but I get an empty list(). html_nodes itself returns a {xml_nodeset (0)} that I don't know how to handle.

解决方案

read.csv("http://financials.morningstar.com/ajax/ReportProcess4CSV.html?&t=XNAS:MSFT&region=usa&culture=en-US&cur=&reportType=is&period=12&dataType=A&order=asc&columnYear=5&curYearPart=1st5year&rounding=3&view=raw&r=865827&denominatorView=raw&number=3", skip=1)

   Fiscal.year.ends.in.June..USD.in.millions.except.per.share.data. X2011.06 X2012.06 X2013.06 X2014.06 X2015.06      TTM
1                                                           Revenue 69943.00 73723.00 77849.00 86833.00 93580.00 90758.00
2                                                   Cost of revenue 15577.00 17530.00 20249.00 26934.00 33038.00 31972.00
3                                                      Gross profit 54366.00 56193.00 57600.00 59899.00 60542.00 58786.00
4                                                Operating expenses       NA       NA       NA       NA       NA       NA
5                                          Research and development  9043.00  9811.00 10411.00 11381.00 12046.00 11943.00
6                                 Sales, General and administrative 18162.00 18426.00 20425.00 20632.00 20324.00 19862.00
7                             Restructuring, merger and acquisition       NA       NA       NA   127.00       NA       NA
8                                          Other operating expenses       NA  6193.00       NA       NA 10011.00  8871.00
9                                          Total operating expenses 27205.00 34430.00 30836.00 32140.00 42381.00 40676.00
10                                                 Operating income 27161.00 21763.00 26764.00 27759.00 18161.00 18110.00
11                                                 Interest Expense   295.00   380.00   429.00   597.00   781.00   869.00
12                                           Other income (expense)  1205.00   884.00   717.00   658.00  1127.00   883.00
13                                              Income before taxes 28071.00 22267.00 27052.00 27820.00 18507.00 18124.00
14                                       Provision for income taxes  4921.00  5289.00  5189.00  5746.00  6314.00  5851.00
15                            Net income from continuing operations 23150.00 16978.00 21863.00 22074.00 12193.00 12273.00
16                                                       Net income 23150.00 16978.00 21863.00 22074.00 12193.00 12273.00
17                      Net income available to common shareholders 23150.00 16978.00 21863.00 22074.00 12193.00 12273.00
18                                               Earnings per share       NA       NA       NA       NA       NA       NA
19                                                            Basic     2.73     2.02     2.61     2.66     1.49     1.51
20                                                          Diluted     2.69     2.00     2.58     2.63     1.48     1.50
21                              Weighted average shares outstanding       NA       NA       NA       NA       NA       NA
22                                                            Basic  8490.00  8396.00  8375.00  8299.00  8177.00  8114.00
23                                                          Diluted  8593.00  8506.00  8470.00  8399.00  8254.00  8183.00
24                                                           EBITDA 31132.00 25614.00 31236.00 33629.00 25245.00 24983.00

It's super-helpful to make browser Developer Tools "Network" tab your BFF.

(that URL came from inspecting what the "Export" button does).

这篇关于使用 R 和 rvest 抓取财务数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆