在 data.table 中四舍五入并过滤 [英] Round to a multiple and filter in data.table

查看:18
本文介绍了在 data.table 中四舍五入并过滤的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常有趣的问题,虽然我不想有一个.我必须将一个数字四舍五入,所以我遵循了解决方案 here它曾经工作正常,直到我发现 data.table 的错误

I have very interesting problem, though I'd rather not to have one. I have to round a number to a closes multiple so I followed the solution here It used to work OK, until I've discover the bug with data.table

library(data.table)
options(digits = 20) # to see number representation
mround <- function (number, multiple) {
   return(multiple * round(number / multiple))
}
DT = data.table(a = mround(112.3, 0.1), b = "B")
DT[a == 112.3,] # works as expected, i.e returns one row
DT[a == 112.3 & b == 'B', ] # doesn't work

公平地说,使用 data.frame 甚至第一个过滤器也不起作用.任何想法如何解决这个问题?

To be fair, with data.frame even the first filter doesn't work. Any ideas how to fix that?

推荐答案

浮点精度.见 DT[abs(a - 112.3)<1.e-6 &b == 'B',] 使用 0.000001 的误差范围会给你正确的结果.

It's a problem of floating point precision. See DT[abs(a - 112.3)<1.e-6 & b == 'B',] using an error margin of 0.000001 will give you proper result.

如果您想要更高的精度,您可以使用 .Machine$double.eps^0.5all.equal.

If you want more precision you can use .Machine$double.eps^0.5 as does all.equal.

一般建议是永远不要比较浮点数的相等性,而是将差异与足够接近机器精度的值进行比较,以绕过 0 和 1 之间的精度漂移),更多细节 这里

General advice is to never compare equality of floats but compare the difference with a value near enough to the machine precision to get around the precision drift between 0 and 1), more details here

解决问题的一种方法是将函数重构为:

One way to fix your problem could be to refactor your function to:

mround <- function (number, multiple, digits=nchar(strsplit(as.character(multiple),".",fixed=TRUE)[[1]][2])) {

   round(multiple * round(number / multiple),digits)
}

我使用复杂"方法从作为默认有效数字传递的多个中获取所需的数字,以适应您的需求(例如,您可以在这里使用 2,或者在调用时强制精度).
我删除了不必要的 return,它只会导致解释器在函数调用结束时查找已经调用的函数.

I used a "convoluted" method to get the digits needed from the multiple passed as default significant digits, adapt to your needs (you may used 2 here for example, or force the precision when calling).
I removed the unnecessary return which just cause the interpreter to look for a function already called at end of the function call.

这样你的输出应该足够精确,但你仍然会遇到极端情况:

This way your output should be precise enough, but you'll still have corner cases:

> mround(112.37,0.2)
[1] 112.40000000000001

要在连接中使用浮点数,您可以使用(由 David Arenburg 提供):

To use floats in joins, you can use (courtesy of David Arenburg):

setNumericRounding(1)
DT[.(112.3, 'B'), nomatch = 0L, on = .(a, b)]

这篇关于在 data.table 中四舍五入并过滤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆