在函数内，如果未找到xpath，则返回NA或0 [英] Within a function, return NA or 0 if xpath is not to found

查看：85 发布时间：2021/5/2 20:55:26 r web-scraping xpath dplyr rvest

本文介绍了在函数内，如果未找到xpath，则返回NA或0的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在一个函数中，我需要返回"NA".或更好的"0"对于不在(thapge)上不是(！)的xpath项.在大多数页面上，我从列表中抓取到xpath项目存在，但在某些页面上则不存在.如果不存在，则返回向量将变得不对称并且无法进一步组合.

Within a function, I need to return "NA" or better "0" for an xpath item that is NOT (!) on thapge. On most pages I scrape from the list the xpath item exists, but on some not. If it doesn't exists, the return vector becomes asymmetrical and connot be further combined.

return_data <- function(url) {
  page <- url %>% read_html 
  tibble(YealyRevenue = page %>%
           html_nodes(xpath = '//div[contains(h4, "YealyRevenue")]') %>%
           html_text(trim = TRUE) %>%
           parse_number(), 
         Cashflow = page %>% 
           html_nodes(xpath = '//div[contains(h4, "Cashflow:")]') %>% 
           html_text(trim = TRUE) %>% 
           parse_number(), 
         Spendings =  page %>% 
           html_nodes(xpath = '//*[@id="Spendings"]/a' ) %>% 
           html_text(trim = TRUE) %>% 
           parse_number(), 
         Return = page %>% 
           html_nodes(xpath = '//*[@id="Return"]/div[1]/div[2]/div/div[2]/div[2]/h1') %>%
           html_text(trim = TRUE))
}

最后一项是在我抓取的所有页面上并不总是存在的一项.

The last item is the one which is not always existent on all the pages I scrape.

Return = page %>% 
           html_nodes(xpath = '//*[@id="Return"]/div[1]/div[2]/div/div[2]/div[2]/h1') %>%
           html_text(trim = TRUE)

因此，我需要类似的东西

So for this, I would need something like

"如果找不到此xpath，请返回"0"

"If this xpath is not found, please return "0"

感谢任何潜在客户！

推荐答案

我们可以用 tryCatch 包裹链，并在存在时指定 return 值.>错误.如果有 warning

We could wrap the chain with a tryCatch and specify the return value when there is an error. It is also possible to add more return values in case there are warning

return_data <- function(url) {
  page <- url %>% read_html 
  YealyRevenue <- page %>%
           html_nodes(xpath = '//div[contains(h4, "YealyRevenue")]') %>%
           html_text(trim = TRUE) %>%
           parse_number()
  Cashflow <- page %>% 
           html_nodes(xpath = '//div[contains(h4, "Cashflow:")]') %>% 
           html_text(trim = TRUE) %>% 
           parse_number()
  Spendings <- page %>% 
           html_nodes(xpath = '//*[@id="Spendings"]/a' ) %>% 
           html_text(trim = TRUE) %>% 
           parse_number()
   Return <- tryCatch({ 
         page %>% 
           html_nodes(xpath =
            '//*[@id="Return"]/div[1]/div[2]/div/div[2]/div[2]/h1') %>%
           html_text(trim = TRUE)},
            error = function(err) {
            message("xpath doesn't exist")
            return(NA)
            })

  return(tibble(YearlyRevenue, Cashflow, Spending, Return))            
           
}

这篇关于在函数内，如果未找到xpath，则返回NA或0的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在函数内，如果未找到xpath，则返回NA或0 [英] Within a function, return NA or 0 if xpath is not to found

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在函数内，如果未找到xpath，则返回NA或0 [英] Within a function, return NA or 0 if xpath is not to found

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭