在函数内,如果未找到xpath,则返回NA或0 [英] Within a function, return NA or 0 if xpath is not to found

查看:85
本文介绍了在函数内,如果未找到xpath,则返回NA或0的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在一个函数中,我需要返回"NA".或更好的"0"对于不在(thapge)上不是(!)的xpath项.在大多数页面上,我从列表中抓取到xpath项目存在,但在某些页面上则不存在.如果不存在,则返回向量将变得不对称并且无法进一步组合.

Within a function, I need to return "NA" or better "0" for an xpath item that is NOT (!) on thapge. On most pages I scrape from the list the xpath item exists, but on some not. If it doesn't exists, the return vector becomes asymmetrical and connot be further combined.

return_data <- function(url) {
  page <- url %>% read_html 
  tibble(YealyRevenue = page %>%
           html_nodes(xpath = '//div[contains(h4, "YealyRevenue")]') %>%
           html_text(trim = TRUE) %>%
           parse_number(), 
         Cashflow = page %>% 
           html_nodes(xpath = '//div[contains(h4, "Cashflow:")]') %>% 
           html_text(trim = TRUE) %>% 
           parse_number(), 
         Spendings =  page %>% 
           html_nodes(xpath = '//*[@id="Spendings"]/a' ) %>% 
           html_text(trim = TRUE) %>% 
           parse_number(), 
         Return = page %>% 
           html_nodes(xpath = '//*[@id="Return"]/div[1]/div[2]/div/div[2]/div[2]/h1') %>%
           html_text(trim = TRUE))
}

最后一项是在我抓取的所有页面上并不总是存在的一项.

The last item is the one which is not always existent on all the pages I scrape.

Return = page %>% 
           html_nodes(xpath = '//*[@id="Return"]/div[1]/div[2]/div/div[2]/div[2]/h1') %>%
           html_text(trim = TRUE)

因此,我需要类似的东西

So for this, I would need something like

"如果找不到此xpath,请返回"0"

"If this xpath is not found, please return "0"

感谢任何潜在客户!

推荐答案

我们可以用 tryCatch 包裹链,并在存在时指定 return 值.>错误.如果有 warning

We could wrap the chain with a tryCatch and specify the return value when there is an error. It is also possible to add more return values in case there are warning

return_data <- function(url) {
  page <- url %>% read_html 
  YealyRevenue <- page %>%
           html_nodes(xpath = '//div[contains(h4, "YealyRevenue")]') %>%
           html_text(trim = TRUE) %>%
           parse_number()
  Cashflow <- page %>% 
           html_nodes(xpath = '//div[contains(h4, "Cashflow:")]') %>% 
           html_text(trim = TRUE) %>% 
           parse_number()
  Spendings <- page %>% 
           html_nodes(xpath = '//*[@id="Spendings"]/a' ) %>% 
           html_text(trim = TRUE) %>% 
           parse_number()
   Return <- tryCatch({ 
         page %>% 
           html_nodes(xpath =
            '//*[@id="Return"]/div[1]/div[2]/div/div[2]/div[2]/h1') %>%
           html_text(trim = TRUE)},
            error = function(err) {
            message("xpath doesn't exist")
            return(NA)
            })

  return(tibble(YearlyRevenue, Cashflow, Spending, Return))            
           
}

这篇关于在函数内,如果未找到xpath,则返回NA或0的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆