如果遇到非限定值(NA,NaN或Inf),如何强制出错 [英] How to force an error if non-finite values (NA, NaN, or Inf) are encountered

查看:197
本文介绍了如果遇到非限定值(NA,NaN或Inf),如何强制出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一个条件调试标志,我想从Matlab: dbstop if infnan 。如果设置,当遇到 Inf NaN 时,此条件将停止代码执行(IIRC,Matlab没有N / A)

如何在R中以更有效的方式实现这一点,而不是在每次转让操作之后测试所有对象?



目前,我看到的唯一方式是通过以下方式进行攻击:


  1. 手动插入测试可能会遇到这些值的所有地方(例如,除以0可能发生除法)。测试将使用 is.finite()在本Q&每个元素上的一个

  2. 使用 body()修改代码以调用单独的函数

  3. 修改R的源代码(?!?)

  4. 尝试使用 tracemem 来标识那些已更改的变量,并检查这些变量是否为错误值。

  5. (新 - 请参见注释2)使用某种调用处理程序/回调函数来调用测试函数。

第一个选项是我正在做的现在。这是乏味的,因为我不能保证我检查了一切。第二个选项将测试所有内容,即使对象尚未更新。这是浪费时间的浪费。第三个选项将涉及修改NA,NaN和无限值(+/- Inf)的赋值,从而产生错误。这似乎最好留给R Core。第四个选项就像第二个 - 我需要调用一个单独的函数列出所有的内存位置,只是对那些已经改变的那些,然后检查值;我甚至不确定这将适用于所有对象,因为程序可能会进行就地修改,这似乎不会调用复制函数。 p>

有没有更好的方法,我失踪了?也许是Mark Bravington,Luke Tierney的一些聪明的工具,或是一些比较基本的东西 - 类似于 options()参数或编译R时的标志?



示例代码以下是一些非常简单的示例代码来测试,并结合Josh O提出的 addTaskCallback 函数布莱恩代码不会中断,但在第一种情况下会出现错误,而在第二种情况下不会出现错误(即 badDiv(0,0,FALSE) doesn' t中止)。我仍在调查回调,因为这看起来很有希望。

  badDiv<  -  function(x,y,flag){
z = x / y
if(flag == TRUE){
return(z)
} else {
return(FALSE)
}


addTaskCallback(stopOnNaNs)
badDiv(0,0,TRUE)

addTaskCallback(stopOnNaNs)
badDiv(0,0,FALSE)






注意1.我对解决方案用于标准R操作,尽管我的很多计算涉及通过 data.table bigmemory (即磁盘的存储器映射矩阵)。这些似乎与标准矩阵和数据框架操作有些不同的记忆行为。



注意2.回调的想法似乎有点更有前途,因为这不需要我编写突变R代码的函数,例如通过 body()的想法。



注意3.我不知道是否有一些简单测试非有限值的存在的方式,例如关于对象的元信息,其中NAs,Infs等存储在对象中,或者如果这些存储在适当位置。到目前为止,我已经尝试过Simon Urbanek的检查包,并没有找到一种方法来神圣地存在非数字值。



后续行动 Simon Urbanek在评论中指出,这些信息不可用作对象的元信息。



<注意4.我还在测试提出的想法。另外,根据Simon的建议,测试非限定值的存在应该是C / C ++中最快的;这应该超过甚至编译的R代码,但我对任何事情都是开放的。对于大型数据集,例如大约10-50GB的数量,这应该比复制数据大大节省。可以通过使用多个内核来获得进一步的改进,但这更加先进。

解决方案

实现)是非常不完美的。我犹豫甚至建议,但是:(a)我认为这是有趣的,即使在所有的丑陋;和(b)我可以想到有用的情况。鉴于这听起来像是你现在在每次计算之后手动插入支票,我希望你的情况是其中之一。



我的两步黑客首先,我定义了一个函数 nanDetector(),该函数旨在检测几个对象类型中的 NaN 由您的计算返回。然后,使用 addTaskCallback()上调用函数 nanDetector()。每个顶级任务/计算完成之后的值。当在其中一个返回值中找到一个 NaN 时,它会抛出一个错误,您可以使用它来避免任何进一步的计算。



其中的缺点:




  • 如果您执行类似设置 stop(error =恢复),很难确定错误被触发的位置,因为错误总是从 stopOnNaNs()中抛出。


  • 当它抛出错误时, stopOnNaNs()在终止之前可以返回 TRUE 。因此,它将从任务列表中删除,您需要重新设置要再次使用它的 addTaskCallback(stopOnNaNs)。 (请参阅?addTaskCallback的参数部分? / a>更多细节)




不用多说,这里是:






 #在几种对象类型中测试NaN的函数草图
nanDetector< - 函数(X){
#检查数据框
if(is.data.frame(X)){
return(any(unlist(sapply(X,is.nan))) )
}
#检查向量,矩阵或数组
if(is.numeric(X)){
return(any(is.nan(X)))
}
#检查列表,包括嵌套列表
if(is.list(X)){
return(any(rapply(X,is.nan)))
}
return(FALSE)
}

#设置taskCallback
stopOnNaNs< - function(...){
if( nanDetector(.Last.value)){stop(NaNs detected!\\\
)}
return(TRUE)
}
addTaskCallback(sto pOnNaNs)


#尝试出来
j< - 1:00
y< - rnorm(99)
l< - list(a = 1:4,b = list(j = 1:4,k = NaN))
#函数中的错误(...):检测到NaN!

#如果上面抛出的
#错误用于停止其评估,则可以避免后续耗时的代码。


There's a conditional debugging flag I miss from Matlab: dbstop if infnan described here. If set, this condition will stop code execution when an Inf or NaN is encountered (IIRC, Matlab doesn't have NAs).

How might I achieve this in R in a more efficient manner than testing all objects after every assignment operation?

At the moment, the only ways I see to do this are via hacks like the following:

  1. Manually insert a test after all places where these values might be encountered (e.g. a division, where division by 0 may occur). The testing would be to use is.finite(), described in this Q & A, on every element.
  2. Use body() to modify the code to call a separate function, after each operation or possibly just each assignment, which tests all of the objects (and possibly all objects in all environments).
  3. Modify R's source code (?!?)
  4. Attempt to use tracemem to identify those variables that have changed, and check only these for bad values.
  5. (New - see note 2) Use some kind of call handlers / callbacks to invoke a test function.

The 1st option is what I am doing at present. This is tedious, because I can't guarantee I've checked everything. The 2nd option will test everything, even if an object hasn't been updated. That is a massive waste of time. The 3rd option would involve modifying assignments of NA, NaN, and infinite values (+/- Inf), so that an error is produced. That seems like it's better left to R Core. The 4th option is like the 2nd - I'd need a call to a separate function listing all of the memory locations, just to ID those that have changed, and then check the values; I'm not even sure this will work for all objects, as a program may do an in-place modification, which seems like it would not invoke the duplicate function.

Is there a better approach that I'm missing? Maybe some clever tool by Mark Bravington, Luke Tierney, or something relatively basic - something akin to an options() parameter or a flag when compiling R?

Example code Here is some very simple example code to test with, incorporating the addTaskCallback function proposed by Josh O'Brien. The code isn't interrupted, but an error does occur in the first scenario, while no error occurs in the second case (i.e. badDiv(0,0,FALSE) doesn't abort). I'm still investigating callbacks, as this looks promising.

badDiv  <- function(x, y, flag){
    z = x / y
    if(flag == TRUE){
        return(z)
    } else {
        return(FALSE)
    }
}

addTaskCallback(stopOnNaNs)
badDiv(0, 0, TRUE)

addTaskCallback(stopOnNaNs)
badDiv(0, 0, FALSE)


Note 1. I'd be satisfied with a solution for standard R operations, though a lot of my calculations involve objects used via data.table or bigmemory (i.e. disk-based memory mapped matrices). These appear to have somewhat different memory behaviors than standard matrix and data.frame operations.

Note 2. The callbacks idea seems a bit more promising, as this doesn't require me to write functions that mutate R code, e.g. via the body() idea.

Note 3. I don't know whether or not there is some simple way to test the presence of non-finite values, e.g. meta information about objects that indexes where NAs, Infs, etc. are stored in the object, or if these are stored in place. So far, I've tried Simon Urbanek's inspect package, and have not found a way to divine the presence of non-numeric values.

Follow-up: Simon Urbanek has pointed out in a comment that such information is not available as meta information for objects.

Note 4. I'm still testing the ideas presented. Also, as suggested by Simon, testing for the presence of non-finite values should be fastest in C/C++; that should surpass even compiled R code, but I'm open to anything. For large datasets, e.g. on the order of 10-50GB, this should be a substantial savings over copying the data. One may get further improvements via use of multiple cores, but that's a bit more advanced.

解决方案

The idea sketched below (and its implementation) is very imperfect. I'm hesitant to even suggest it, but: (a) I think it's kind of interesting, even in all of its ugliness; and (b) I can think of situations where it would be useful. Given that it sounds like you are right now manually inserting a check after each computation, I'm hopeful that your situation is one of those.

Mine is a two-step hack. First, I define a function nanDetector() which is designed to detect NaNs in several of the object types that might be returned by your calculations. Then, it using addTaskCallback() to call the function nanDetector() on .Last.value after each top-level task/calculation is completed. When it finds an NaN in one of those returned values, it throws an error, which you can use to avoid any further computations.

Among its shortcomings:

  • If you do something like setting stop(error = recover), it's hard to tell where the error was triggered, since the error is always thrown from inside of stopOnNaNs().

  • When it throws an error, stopOnNaNs() is terminated before it can return TRUE. As a consequence, it is removed from the task list, and you'll need to reset with addTaskCallback(stopOnNaNs) it you want to use it again. (See the 'Arguments' section of ?addTaskCallback for more details).

Without further ado, here it is:


# Sketch of a function that tests for NaNs in several types of objects
nanDetector <- function(X) {
   # To examine data frames
   if(is.data.frame(X)) { 
       return(any(unlist(sapply(X, is.nan))))
   }
   # To examine vectors, matrices, or arrays
   if(is.numeric(X)) {
       return(any(is.nan(X)))
   }
   # To examine lists, including nested lists
   if(is.list(X)) {
       return(any(rapply(X, is.nan)))
   }
   return(FALSE)
}

# Set up the taskCallback
stopOnNaNs <- function(...) {
    if(nanDetector(.Last.value)) {stop("NaNs detected!\n")}
    return(TRUE)
}
addTaskCallback(stopOnNaNs)


# Try it out
j <- 1:00
y <- rnorm(99)
l <- list(a=1:4, b=list(j=1:4, k=NaN))
# Error in function (...)  : NaNs detected!

# Subsequent time consuming code that could be avoided if the
# error thrown above is used to stop its evaluation.

这篇关于如果遇到非限定值(NA,NaN或Inf),如何强制出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆