从R中的网页抓取多个表 [英] Scraping multiple table out of webpage in R

查看:1013
本文介绍了从R中的网页抓取多个表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图将共同基金数据拉到R,我的代码工作单表,但是当一个网页中有多个表,它不工作。



链接 - https://in.finance.yahoo.com/q/pm?s=115748.BO



我的代码



  url<  - https://in.finance.yahoo.com/q/pm?s=115748.BO
library(XML )
perftable< - readHTMLTable(url,header = T,which = 1,stringsAsFactors = F)


$ b


错误(function(classes,fdef,mtable):
无法找到函数'readHTMLTable'用于签名'NULL'的继承方法
此外:警告消息:
XML内容似乎不是XML:' https://in.finance.yahoo.com/q/pm?s=115748.BO '


我的问题是



  1. 如何将所有表格拖出此网页?

  2. 当有多个链接时,从这些网页中提取特定表格的简单方法是什么?




Ahttps://in.finance.yahoo.com/q/pm?s = 115748.BO



Ahttps: /in.finance.yahoo.com/q/pm?s=115749.BO



Ahttps://in.finance.yahoo.com/q/pm?s = 115750.BO


使用链接时,从链接中删除A。

https 。您可以使用 RCurl 的软件包。表上的头实际上是单独的表。页面实际上由30多个表组成。您想要的数据最像表格中的 class = yfnc_datamodoutline1

  url<  - https://in.finance.yahoo.com/q/pm?s=115748.BO
库(XML)
库(RCurl)
appData< - getURL(url,ssl.verifypeer = FALSE)
doc< - htmlParse(appData)
appData< - doc ['// table [@ class =yfnc_datamodoutline1]']
perftable< - readHTMLTable(appData [[1]],stringsAsFactors = F)
> perftable
V1 V2
1 Morningstar返回等级:2.00
2年迄今回报:2.77%
3 5年平均回报:9.76%
4数量年龄:4
5退休年数:1
6最好1年总回报(2014-12-31):37.05%
7最糟糕的1岁总回报(2011-12) -31):-27.26%
8最佳3年总回报(N / A):23.11%
9最差3年总回报(N / A):-0.33%


I am trying to pull mutual funds data into R, My way of code works for single table but when there are multiple tables in a webpage, it doesn't work.

Link - https://in.finance.yahoo.com/q/pm?s=115748.BO

My Code

url <- "https://in.finance.yahoo.com/q/pm?s=115748.BO"
library(XML)
perftable <- readHTMLTable(url, header = T, which = 1, stringsAsFactors = F)

but i am getting an error message.

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’ In addition: Warning message: XML content does not seem to be XML: 'https://in.finance.yahoo.com/q/pm?s=115748.BO'

My Question is

  1. How to pull a specific table out of this webpage?
  2. How to pull all tables out of this webpage?
  3. when there are multiple links, what would be the easy way to pull specific table from each those webpages

Ahttps://in.finance.yahoo.com/q/pm?s=115748.BO

Ahttps://in.finance.yahoo.com/q/pm?s=115749.BO

Ahttps://in.finance.yahoo.com/q/pm?s=115750.BO

Remove "A" From the link, while using the link.

解决方案

Base R is not able to access https. You can use a package like RCurl. The headers on the tables are actually seperate tables. The page is actually composed of 30+ tables. The data you want is most like given by table with a class = yfnc_datamodoutline1 :

url <- "https://in.finance.yahoo.com/q/pm?s=115748.BO"
library(XML)
library(RCurl)
appData <- getURL(url, ssl.verifypeer = FALSE)
doc <- htmlParse(appData)
appData <- doc['//table[@class="yfnc_datamodoutline1"]']
perftable <- readHTMLTable(appData[[1]], stringsAsFactors = F)
> perftable
V1      V2
1            Morningstar Return Rating:    2.00
2                  Year-to-Date Return:   2.77%
3                5-Year Average Return:   9.76%
4                   Number of Years Up:       4
5                 Number of Years Down:       1
6  Best 1 Yr Total Return (2014-12-31):  37.05%
7 Worst 1 Yr Total Return (2011-12-31): -27.26%
8         Best 3-Yr Total Return (N/A):  23.11%
9        Worst 3-Yr Total Return (N/A):  -0.33%

这篇关于从R中的网页抓取多个表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆