使用R从PHP网站读取数据 [英] Read data from a php website with R
问题描述
我想从这样的表中将数据导入到 R :
I would like to import data into R from a table like this:
http://www.rout.gr /index.php?name=Rout&file=results&year=2011
我尝试使用下面的线程建议的XML库,但是我什么也没得到.
I tried using XML library as suggested by the thread below but I couldn't get anything.
推荐答案
该网站似乎发生了一些时髦的事情.除非您伪造用户代理,否则似乎不返回任何数据.即使这样,readHTMLTable的表现也不是很好,如果将整个doc
传递给它,则会返回一个错误.阅读源代码后,您可以看到相关表的ID为table_results_r_1
并将其隔离并将结果传递给作品:
There do seem to be some funky things going on with that site. It seems to return no data unless you fake the user-agent. Even then, readHTMLTable doesn't behave too well, returning an error if you pass it the whole doc
. After reading the source, you can see that the relevant table has id table_results_r_1
and isolating that and passing the result through works:
library(XML)
library(httr)
theurl <- "http://www.rout.gr/index.php?name=Rout&file=results&year=2011"
doc <- htmlParse(GET(theurl, user_agent("Mozilla")))
results <- xpathSApply(doc, "//*/table[@id='table_results_r_1']")
results <- readHTMLTable(results[[1]])
rm(doc)
现在,您需要整理表的列名.
Now you'll need to tidy up the table column names.
这篇关于使用R从PHP网站读取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!