rvest 包 - 如果 html_text() 找不到属性，是否可以存储 NA 值? [英] rvest package - Is it possible for html_text() to store an NA value if it does not find an attribute?

查看：63 发布时间：2021/7/14 18:38:50 r rvest

本文介绍了rvest 包 - 如果 html_text() 找不到属性，是否可以存储 NA 值?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

正如标题所述，我很好奇rvest 包中的html_text() 函数是否可以存储NA 如果无法在特定页面上找到属性的值.

我目前正在抓取超过 199 页的内容(效果很好；已经对一些变量进行了测试).

目前，当我搜索仅出现在 199 个页面中的某些(136 个)页面上的值时，html_text() 仅返回 136 个字符串的向量.这没有用，因为没有 NAs 我无法确定哪些页面包含相关变量.

我看到 html_atts() 能够接收 default 输入，但不能接收 html_text().有什么提示吗?

非常感谢！

解决方案

如果您创建一个新函数来包装错误处理，它将使 %>% 管道更干净，更容易理解为了你未来的自己和他人:

库(rvest)html_text_na <- 函数(x, ...) {txt <- 尝试(html_text(x, ...))如果(继承(txt，尝试错误")|(length(txt)==0)) { return(NA) }返回(txt)}base_url <- "http://www.saem.org/membership/services/residency-directory?RecordID=%d"record_id <- c(1291, 1000, 1166, 1232, 999)sapply(record_id，函数(i){html(sprintf(base_url, i)) %>%html_nodes("#drpict tr:nth-child(6) .text") %>%html_text_na %>%as.numeric()})## [1] 8 不适用 10 27 不适用

此外，通过对 record_id 的向量执行 sapply，您会自动获得一个向量，该向量是您试图提取的任何值.

As the title states, I'm curious if it is possible for the html_text() function from the rvest package to store an NA value if it is not able to find an attribute on a specific page.

I'm currently running a scrape over 199 pages (which works fine; tested on a few variables already).

Currently, when I search for a value that is only present on a some (136) of the 199 pages, html_text() is only returning a vector of 136 strings. This is not useful because without NAs I am unable to determine which pages contained the variable in question.

I see that html_atts() is able to receive a default input, but not html_text(). Any tips?

Thank you so much!

解决方案

If you create a new function to wrap error handling, it'll keep the %>% pipe cleaner and easier to grok for your future self and others:

library(rvest)

html_text_na <- function(x, ...) {

  txt <- try(html_text(x, ...))
  if (inherits(txt, "try-error") |
      (length(txt)==0)) { return(NA) }
  return(txt)

}

base_url <- "http://www.saem.org/membership/services/residency-directory?RecordID=%d"

record_id <- c(1291, 1000, 1166, 1232, 999)

sapply(record_id, function(i) {

  html(sprintf(base_url, i)) %>% 
    html_nodes("#drpict tr:nth-child(6) .text") %>%
    html_text_na %>%
    as.numeric()

})

## [1]  8 NA 10 27 NA

Also, by doing an sapply over the vector of record_id's you automagically get a vector back of whatever value that is you're trying to extract.

这篇关于rvest 包 - 如果 html_text() 找不到属性，是否可以存储 NA 值?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

rvest 包 - 如果 html_text() 找不到属性，是否可以存储 NA 值? [英] rvest package - Is it possible for html_text() to store an NA value if it does not find an attribute?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

rvest 包 - 如果 html_text() 找不到属性，是否可以存储 NA 值? [英] rvest package - Is it possible for html_text() to store an NA value if it does not find an attribute?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭