如何使用R从PHP网站抓取表格? [英] How can I scrape table from PHP website using R?

查看:53
本文介绍了如何使用R从PHP网站抓取表格?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

希望从此页上的表中将数据导入R:

Looking to import data into R from a table on this page:

https://legacy.baseballprospectus.com/standings/index.php?odate=2019-09-10

我尝试过使用XML和httr的多种方法,但都没有碰到运气.已经查看过以前的帖子,包括:

I've tried multiple methods using XML and httr with no luck. Have already looked at past posts including:

从带有R的php网站读取数据

将html表刮入R数据帧使用XML包

想知道我是否使用的不是源中正确的表ID,或者表在给定我当前使用的工具时格式是否不正确?

Wondering if maybe I'm not using the correct table ID from the source or if the table is not in the proper format given the tools I'm currently using?

非常感谢所有帮助!预先感谢!

Any and all help is much appreciated! Thanks in advance!

推荐答案

这并不能完全满足您的需求,但这可能会帮助您入门:

This won't give you exactly what you want, but it might help get you started:

library(XML)
fname <- "standings20190910.html"
download.file("https://legacy.baseballprospectus.com/standings/index.php?odate=2019-09-10", destfile=fname)
doc0 <- htmlParse(file=fname, encoding="UTF-8")
doc1 <- xmlRoot(doc0)
doc2 <- getNodeSet(doc1, "//table[@id='content']")
standings <- readHTMLTable(doc2[[1]], header=TRUE, skip.rows=1, stringsAsFactors=FALSE)

您可以查看您要抓取的表的HTML源代码,然后尝试弄清楚如何创建有用的R对象.仔细查看XML包手册(

You can look at the HTML source code of the table you're trying to scrape, and then try to figure out how to create a useful R object. Look carefully at the documentation for getNodeSet and readHTMLTable in the manual of the XML package (https://cran.r-project.org/web/packages/XML/XML.pdf).

这篇关于如何使用R从PHP网站抓取表格?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆