跳过R中的错误进行循环,并在每次迭代中暂停该过程 [英] Skip errors in R for loops and also pause the process in each iteration

查看:863
本文介绍了跳过R中的错误进行循环,并在每次迭代中暂停该过程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于 R 中的循环,我有两个问题。

I two questions regarding loops in R.

1)我正在使用 XML 包从网站上抓取一些表,并使用<$ c组合它们$ c> rbind 。我正在使用以下命令,如果给定网站中存在价格数据和表格,则该命令可以正常工作。

1) I'm using XML package to scrap some tables from the website and combine them using rbind. I'm using following command and it is working without issues if price data and tables are present in the given websites.

url.list <- c("www1", "www2", "www3")

for(url_var in url.list)
{
  url <- url_var
  url.parsed <- htmlParse(getURL(url), asText = TRUE)
  tableNodes <- getNodeSet(url.parsed, '//*[@id="table"]/table')
  newdata <- readHTMLTable(tableNodes[[1]], header=F, stringsAsFactors=F)
  big.data <- rbind(newdata,  big.data)
  Sys.sleep(30)
}

但有时网页没有对应的表格(在这种情况下,剩下一个带有消息的变量表:未报告当前价格。),我的循环因以下错误消息而停止(由于表列数不匹配):

But sometimes web page does not have corresponding table (in this case I'm left with one variable table with the message: No current prices reported.) and my loop stops with following error message (since number of table columns do not match):

 Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match 

我要 R 忽略该错误并继续下一个网页(跳过具有不同列数的网页)。

I want R to ignore the error and go ahead with the next web page (skipping the one that has different number of columns).

2)在循环的最后,我有 Sys.sleep(30)。是否会迫使 R 等待30秒才能尝试下一个网页。

2) In the end of the loop I have Sys.sleep(30). Does it force R to wait 30 seconds before it tries next web page.

谢谢

推荐答案

在评论中提到@RuiBarradas, tryCatch 是我们处理错误的方法(或R)。具体来说,您需要的是在出现错误时进行下一次迭代,因此您可以执行以下操作:

As @RuiBarradas Mentioned in the comment, tryCatch is the way we handle errors (or even warnings) in R. Specifically in your case, what you need is going to next iteration when there are errors, So you can do like:

for (url_var in url.list) {
    url <- url_var
    url.parsed <- htmlParse(getURL(url), asText = TRUE)
    tryCatch({
        # Try to run the code within these braces
        tableNodes <- getNodeSet(url.parsed, '//*[@id="table"]/table')
        newdata <- readHTMLTable(tableNodes[[1]], header=F, stringsAsFactors=F)
        big.data <- rbind(newdata,  big.data)
    },
        # If there are errors, go to next iteration
        # Sys.sleep(30) won't be executed in such case
        error = next())
    Sys.sleep(30)
}

是的, Sys.sleep(30) 使R在执行时休眠30秒。因此,如果您希望R在每次迭代中始终处于休眠状态,无论解析是否成功,都可以考虑将该行移动到 tryCatch 的前面。

And yes, Sys.sleep(30) makes R sleep for 30 seconds when it is executed. Thus, if you want R to always sleep in every iteration no matter the parsing is successful or not, you may consider moving that line in front of tryCatch.

有关详细信息,请参见如何在R中编写trycatch 中写得很好的答案详细说明 tryCatch

See the well-written answer in How to write trycatch in R for more detailed elaboration of tryCatch.

这篇关于跳过R中的错误进行循环,并在每次迭代中暂停该过程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆