舍入为多个并在data.table中过滤 [英] Round to a multiple and filter in data.table
问题描述
我有一个非常有趣的问题,虽然我宁愿没有一个。
我必须将一个数字舍入为一个闭合倍数,所以我遵循解决方案 here
它以前工作正常,直到我发现了与data.table的错误
I have very interesting problem, though I'd rather not to have one. I have to round a number to a closes multiple so I followed the solution here It used to work OK, until I've discover the bug with data.table
library(data.table)
options(digits = 20) # to see number representation
mround <- function (number, multiple) {
return(multiple * round(number / multiple))
}
DT = data.table(a = mround(112.3, 0.1), b = "B")
DT[a == 112.3,] # works as expected, i.e returns one row
DT[a == 112.3 & b == 'B', ] # doesn't work
$ c> data.frame ,即使第一个过滤器不工作。任何想法如何解决这个问题?
To be fair, with data.frame
even the first filter doesn't work. Any ideas how to fix that?
推荐答案
这是 floating point precision 。
见 DT [abs(a - 112.3)< 1.e-6&
It's a problem of floating point precision.
See DT[abs(a - 112.3)<1.e-6 & b == 'B',]
using an error margin of 0.000001 will give you proper result.
如果你想要更高的精度,你可以使用 .Machine $ double.eps ^ 0.5
与 all.equal
一样。
If you want more precision you can use .Machine$double.eps^0.5
as does all.equal
.
一般建议是永远不要比较浮点数的相等性,而是将差值与接近机器精度的值进行比较,以获得0和1之间的精度漂移,更多详细信息此处
General advice is to never compare equality of floats but compare the difference with a value near enough to the machine precision to get around the precision drift between 0 and 1), more details here
一个方法来解决你的问题可能是重构你的函数:
One way to fix your problem could be to refactor your function to:
mround <- function (number, multiple, digits=nchar(strsplit(as.character(multiple),".",fixed=TRUE)[[1]][2])) {
round(multiple * round(number / multiple),digits)
}
我使用了一个convoluted方法,作为默认有效数字,适应您的需要(例如,您可以在这里使用2,或者在调用时强制调用)。
我删除了不必要的 return
这将导致解释器在函数调用结束时查找已经调用的函数。
I used a "convoluted" method to get the digits needed from the multiple passed as default significant digits, adapt to your needs (you may used 2 here for example, or force the precision when calling).
I removed the unnecessary return
which just cause the interpreter to look for a function already called at end of the function call.
这样你的输出应该足够精确,转弯情况:
This way your output should be precise enough, but you'll still have corner cases:
> mround(112.37,0.2)
[1] 112.40000000000001
要在连接中使用浮动可以使用(由David Arenburg提供):
To use floats in joins, you can use (courtesy of David Arenburg):
setNumericRounding(1)
DT[.(112.3, 'B'), nomatch = 0L, on = .(a, b)]
这篇关于舍入为多个并在data.table中过滤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!