如何用 rvest 和 xpath 刮一张桌子? [英] How to scrape a table with rvest and xpath?
本文介绍了如何用 rvest 和 xpath 刮一张桌子?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
使用以下
链接和xpath已经包含在代码中了:
url <-http://www.marketwatch.com/investing/stock/IRS/profile"估价 <- url %>%html() %>%html_nodes(xpath='//*[@id="maincontent"]/div[2]/div[1]') %>%html_table()估价 <- 估价[[1]]
我收到以下错误:
警告信息:'html' 已弃用.改用read_html".请参阅帮助(已弃用")
提前致谢.
解决方案
那个网站没有使用 html 表格,所以 html_table()
找不到任何东西.它实际上使用了div
类column
和data lastcolumn
.
所以你可以做类似的事情
url <- "http://www.marketwatch.com/investing/stock/IRS/profile"估价_col <- url %>%read_html() %>%html_nodes(xpath='//*[@class="column"]')估价数据<- url %>%read_html() %>%html_nodes(xpath='//*[@class="data lastcolumn"]')
甚至
url %>%read_html() %>%html_nodes(xpath='//*[@class="section"]')
帮助您完成大部分任务.
另请阅读他们的使用条款 - 特别是 3.4.
using the following documentation i have been trying to scrape a series of tables from marketwatch.com
here is the one represented by the code bellow:
The link and xpath are already included in the code:
url <- "http://www.marketwatch.com/investing/stock/IRS/profile"
valuation <- url %>%
html() %>%
html_nodes(xpath='//*[@id="maincontent"]/div[2]/div[1]') %>%
html_table()
valuation <- valuation[[1]]
I get the following error:
Warning message:
'html' is deprecated.
Use 'read_html' instead.
See help("Deprecated")
Thanks in advance.
解决方案
That website doesn't use an html table, so html_table()
can't find anything. It actaully uses div
classes column
and data lastcolumn
.
So you can do something like
url <- "http://www.marketwatch.com/investing/stock/IRS/profile"
valuation_col <- url %>%
read_html() %>%
html_nodes(xpath='//*[@class="column"]')
valuation_data <- url %>%
read_html() %>%
html_nodes(xpath='//*[@class="data lastcolumn"]')
Or even
url %>%
read_html() %>%
html_nodes(xpath='//*[@class="section"]')
To get you most of the way there.
Please also read their terms of use - particularly 3.4.
这篇关于如何用 rvest 和 xpath 刮一张桌子?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文