在R中刮取html表及其href链接 [英] Scraping html table and its href Links in R

查看：110 发布时间：2020/11/24 5:55:48 html r xpath rvest

本文介绍了在R中刮取html表及其href链接的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试下载一个包含文本和链接的表.我可以成功下载带有链接文本"Pass"的表.但是，我想捕获实际的href URL，而不是文本.

I am trying to download a table that contains text and links. I can successfully download the table with the link text "Pass". However, instead of the text, I would like to capture the actual href URL.

library(dplyr)
library(rvest)
library(XML)
library(httr)
library(stringr)

link <- "http://www.qimedical.com/resources/method-suitability/"

qi_webpage <- read_html(link)

qi_table <- html_nodes(qi_webpage, 'table')
qi <- html_table(qi_table, header = TRUE)[[1]]
qi <- qi[,-1]

上面给出了一个不错的数据框.但是，当我希望将链接与之关联时，最后一列仅包含文本"Pass".我试图使用以下内容添加链接，但它们与正确的链接不符行:

Above gives a nice dataframe. However the last column only contains the text "Pass" when I would like to have the link associated with it. I have tried to use the following to add the links, but they do not correspond to the correct row:

qi_get <- GET("http://www.qimedical.com/resources/method-suitability/")
qi_html <- htmlParse(content(qi_get, as="text"))

qi.urls <- xpathSApply(qi_html, "//*/td[7]/a", xmlAttrs, "href")
qi.urls <- qi.urls[1,]

qi <- mutate(qi, "MSTLink" = (ifelse(qi$`Study Protocol(click to download certification)` == "Pass", (t(qi.urls)), "")))

我对html，css等一无所知，所以我不确定要正确完成此操作我缺少什么.

I know little about html, css, etc, so I am not sure what I am missing to accomplish this properly.

谢谢！

推荐答案

您正在表单元td中查找a元素.然后，您需要href 属性的值.因此，这是一种方法，它将返回带有PDF下载的所有URL的向量:

You're looking for a elements inside of table cells, td. Then you want the value of the href attribute. So here's one way, which will return a vector with all the URLs for the PDF downloads:

qi_webpage %>%
  html_nodes(xpath = "//td/a") %>% 
  html_attr("href")

这篇关于在R中刮取html表及其href链接的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在R中刮取html表及其href链接 [英] Scraping html table and its href Links in R

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

在R中刮取html表及其href链接 [英] Scraping html table and its href Links in R

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭