如何用rvest和xpath刮一张桌子? [英] How to scrape a table with rvest and xpath?

查看:38
本文介绍了如何用rvest和xpath刮一张桌子?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用以下

链接和xpath已经包含在代码中了:

url <-http://www.marketwatch.com/investing/stock/IRS/profile"估价 <- url %>%html() %>%html_nodes(xpath='//*[@id="maincontent"]/div[2]/div[1]') %>%html_table()估价 <- 估价[[1]]

我收到以下错误:

警告信息:'html' 已弃用.改用read_html".请参阅帮助(已弃用")

提前致谢.

解决方案

那个网站没有使用 html 表格,所以 html_table() 找不到任何东西.它实际上使用了divcolumndata lastcolumn.

所以你可以做类似的事情

url <- "http://www.marketwatch.com/investing/stock/IRS/profile"估价_col <- url %>%read_html() %>%html_nodes(xpath='//*[@class="column"]')估价数据<- url %>%read_html() %>%html_nodes(xpath='//*[@class="data lastcolumn"]')

甚至

url %>%read_html() %>%html_nodes(xpath='//*[@class="section"]')

帮助您完成大部分任务.

另请阅读他们的使用条款 - 特别是 3.4.

using the following documentation i have been trying to scrape a series of tables from marketwatch.com

here is the one represented by the code bellow:

The link and xpath are already included in the code:

url <- "http://www.marketwatch.com/investing/stock/IRS/profile"
valuation <- url %>%
  html() %>%
  html_nodes(xpath='//*[@id="maincontent"]/div[2]/div[1]') %>%
  html_table()
valuation <- valuation[[1]]

I get the following error:

Warning message:
'html' is deprecated.
Use 'read_html' instead.
See help("Deprecated") 

Thanks in advance.

解决方案

That website doesn't use an html table, so html_table() can't find anything. It actaully uses div classes column and data lastcolumn.

So you can do something like

url <- "http://www.marketwatch.com/investing/stock/IRS/profile"
valuation_col <- url %>%
  read_html() %>%
  html_nodes(xpath='//*[@class="column"]')
    
valuation_data <- url %>%
  read_html() %>%
  html_nodes(xpath='//*[@class="data lastcolumn"]')

Or even

url %>%
  read_html() %>%
  html_nodes(xpath='//*[@class="section"]')

To get you most of the way there.

Please also read their terms of use - particularly 3.4.

这篇关于如何用rvest和xpath刮一张桌子?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆