从R中的网页抓取多个表 [英] Scraping multiple table out of webpage in R
问题描述
我试图将共同基金数据拉到R,我的代码工作单表,但是当一个网页中有多个表,它不工作。
链接 - https://in.finance.yahoo.com/q/pm?s=115748.BO
我的代码
url< - https://in.finance.yahoo.com/q/pm?s=115748.BO
library(XML )
perftable< - readHTMLTable(url,header = T,which = 1,stringsAsFactors = F)
$ b
错误(function(classes,fdef,mtable):
无法找到函数'readHTMLTable'用于签名'NULL'的继承方法
此外:警告消息:
XML内容似乎不是XML:' https://in.finance.yahoo.com/q/pm?s=115748.BO '
我的问题是
- 如何将所有表格拖出此网页?
- 当有多个链接时,从这些网页中提取特定表格的简单方法是什么?
Ahttps://in.finance.yahoo.com/q/pm?s = 115748.BO
Ahttps: /in.finance.yahoo.com/q/pm?s=115749.BO
Ahttps://in.finance.yahoo.com/q/pm?s = 115750.BO
使用链接时,从链接中删除A。
RCurl
的软件包。表上的头实际上是单独的表。页面实际上由30多个表组成。您想要的数据最像表格中的 class = yfnc_datamodoutline1
: url< - https://in.finance.yahoo.com/q/pm?s=115748.BO
库(XML)
库(RCurl)
appData< - getURL(url,ssl.verifypeer = FALSE)
doc< - htmlParse(appData)
appData< - doc ['// table [@ class =yfnc_datamodoutline1]']
perftable< - readHTMLTable(appData [[1]],stringsAsFactors = F)
> perftable
V1 V2
1 Morningstar返回等级:2.00
2年迄今回报:2.77%
3 5年平均回报:9.76%
4数量年龄:4
5退休年数:1
6最好1年总回报(2014-12-31):37.05%
7最糟糕的1岁总回报(2011-12) -31):-27.26%
8最佳3年总回报(N / A):23.11%
9最差3年总回报(N / A):-0.33%
I am trying to pull mutual funds data into R, My way of code works for single table but when there are multiple tables in a webpage, it doesn't work.
Link - https://in.finance.yahoo.com/q/pm?s=115748.BO
My Code
url <- "https://in.finance.yahoo.com/q/pm?s=115748.BO"
library(XML)
perftable <- readHTMLTable(url, header = T, which = 1, stringsAsFactors = F)
but i am getting an error message.
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’ In addition: Warning message: XML content does not seem to be XML: 'https://in.finance.yahoo.com/q/pm?s=115748.BO'
My Question is
- How to pull a specific table out of this webpage?
- How to pull all tables out of this webpage?
- when there are multiple links, what would be the easy way to pull specific table from each those webpages
Ahttps://in.finance.yahoo.com/q/pm?s=115748.BO
Ahttps://in.finance.yahoo.com/q/pm?s=115749.BO
Ahttps://in.finance.yahoo.com/q/pm?s=115750.BO
Remove "A" From the link, while using the link.
Base R is not able to access https
. You can use a package like RCurl
. The headers on the tables are actually seperate tables. The page is actually composed of 30+ tables. The data you want is most like given by table with a class = yfnc_datamodoutline1
:
url <- "https://in.finance.yahoo.com/q/pm?s=115748.BO"
library(XML)
library(RCurl)
appData <- getURL(url, ssl.verifypeer = FALSE)
doc <- htmlParse(appData)
appData <- doc['//table[@class="yfnc_datamodoutline1"]']
perftable <- readHTMLTable(appData[[1]], stringsAsFactors = F)
> perftable
V1 V2
1 Morningstar Return Rating: 2.00
2 Year-to-Date Return: 2.77%
3 5-Year Average Return: 9.76%
4 Number of Years Up: 4
5 Number of Years Down: 1
6 Best 1 Yr Total Return (2014-12-31): 37.05%
7 Worst 1 Yr Total Return (2011-12-31): -27.26%
8 Best 3-Yr Total Return (N/A): 23.11%
9 Worst 3-Yr Total Return (N/A): -0.33%
这篇关于从R中的网页抓取多个表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!