R唯一命令和表命令不一致 [英] R unique command and table command disagree
问题描述
我有一个奇怪的问题.为了说明:
I have a strange problem. To illustrate:
a <- c(3.099331946117620972814,
3.099331946117621860992)
> unique(a)
[1] 3.099331946117620972814 3.099331946117621860992
> table(a)
a
3.09933194611762
2
因此,unique()
正确地识别出第15位数字之后的数字是不同的. table()
但是认为它们没有什么不同.
So unique()
correctly recognises that the numbers are different after the 15th digit. table()
however does not consider them different.
这可能是预期的行为,但由于我需要他们两个都同意,因此在我的某些代码中导致错误:
This may be expected behaviour but it is causing an error in some of my code as I need them both to agree:
times <- sort(unique(times))
k <- as.numeric(table(times))
times正确地提取了唯一时间. k应该是每次发生的次数计数,但是由于上述问题,它不能正确执行此操作.
times is correctly pulling out unique times. k is supposed to be the count of number of times each time occurs, but because of the above issue it doesn't do this correctly.
有人建议获得独特性并同意餐桌吗? (或其他解决方法?)
Anyone have a suggestion to get unique and table to agree? (or other workaround?)
推荐答案
从计算机的角度来看,尝试在浮点数上使用unique
或table
在概念上是有问题的.该主题与 R FAQ 7.31 ,摘录如下:
Trying to use unique
or table
on floating-point number is conceptually problematic from the computer's standpoint. This topic is strongly related to the R FAQ 7.31, with an excerpt:
唯一可以用R的数字类型精确表示的数字是分母为2的幂的整数和分数.所有其他数字在内部均舍入为(通常)53个二进制数位精度.结果,除非两个浮点数已经由同一算法计算出,否则它们将不会可靠地相等,即使在相同的算法下也不会总是相等.例如,
The only numbers that can be represented exactly in R’s numeric type are integers and fractions whose denominator is a power of 2. All other numbers are internally rounded to (typically) 53 binary digits accuracy. As a result, two floating point numbers will not reliably be equal unless they have been computed by the same algorithm, and not always even then. For example,
R> a <- sqrt(2)
R> a * a == 2
[1] FALSE
R> a * a - 2
[1] 4.440892e-16
R> print(a * a, digits = 18)
[1] 2.00000000000000044
(还有其他示例,如果好奇的话,我鼓励您阅读该FAQ主题中的更多内容.)
(Other examples exist, if curious I encourage you to read more in that FAQ topic.)
因此,我建议您确定所需的精度,然后在寻找唯一性时准确使用这些数字.使用您的号码,您可以使用format
(和sprintf
)强制问题:
Because of this, I suggest you decide on a required precision, then use exactly those digits when looking for uniqueness. Using your numbers, you can force the issue with format
(and sprintf
):
a <- c(3.099331946117620972814,
3.099331946117621860992)
table(format(a, digits = 15))
# 3.09933194611762
# 2
table(format(a, digits = 16))
# 3.099331946117621 3.099331946117622
# 1 1
unique(format(a, digits = 15))
# [1] "3.09933194611762"
unique(format(a, digits = 16))
# [1] "3.099331946117621" "3.099331946117622"
出于好奇,unique
和table
不同的原因是table
使用factor
的某个地方,而factor
反过来又使用as.character(y)
.如果您执行as.character(a)
,则会将精度任意降低到14位:
For the curious, the reason unique
and table
are different is rooted somewhere in table
's use of factor
, which in turn uses as.character(y)
. If you do as.character(a)
, it is arbitrarily cutting the precision to 14 digits:
as.character(a)
# [1] "3.09933194611762" "3.09933194611762"
因此,要回答您提出的问题:unique
和table
是不同的,因为table
最终使用了as.character
,默认情况下,这里会截断为14位数字. (由于它是原始的,因此您必须进入低级源才能找出其中的一个.)
So to answer the question you asked: unique
and table
are different because table
ultimately uses as.character
, which by default truncates to 14 digits here. (Since it's a primitive, you'll have to go into the low-level source to figure that one out.)
我在上面回答的问题是一个基本的假设,即在浮点上使用unique
是一件好事(我认为不是").
The question I answered above is to the underlying assumption that using unique
on floating-point is a good thing to do (which I argue "it is not").
这篇关于R唯一命令和表命令不一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!