R唯一命令和表命令不一致 [英] R unique command and table command disagree

查看:80
本文介绍了R唯一命令和表命令不一致的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个奇怪的问题.为了说明:

I have a strange problem. To illustrate:

a <- c(3.099331946117620972814,
       3.099331946117621860992)

> unique(a)
[1] 3.099331946117620972814 3.099331946117621860992
> table(a)
a
3.09933194611762 
               2

因此,unique()正确地识别出第15位数字之后的数字是不同的. table()但是认为它们没有什么不同.

So unique() correctly recognises that the numbers are different after the 15th digit. table() however does not consider them different.

这可能是预期的行为,但由于我需要他们两个都同意,因此在我的某些代码中导致错误:

This may be expected behaviour but it is causing an error in some of my code as I need them both to agree:

times <- sort(unique(times))
k <- as.numeric(table(times))

times正确地提取了唯一时间. k应该是每次发生的次数计数,但是由于上述问题,它不能正确执行此操作.

times is correctly pulling out unique times. k is supposed to be the count of number of times each time occurs, but because of the above issue it doesn't do this correctly.

有人建议获得独特性并同意餐桌吗? (或其他解决方法?)

Anyone have a suggestion to get unique and table to agree? (or other workaround?)

推荐答案

从计算机的角度来看,尝试在浮点数上使用uniquetable在概念上是有问题的.该主题与 R FAQ 7.31 ,摘录如下:

Trying to use unique or table on floating-point number is conceptually problematic from the computer's standpoint. This topic is strongly related to the R FAQ 7.31, with an excerpt:

唯一可以用R的数字类型精确表示的数字是分母为2的幂的整数和分数.所有其他数字在内部均舍入为(通常)53个二进制数位精度.结果,除非两个浮点数已经由同一算法计算出,否则它们将不会可靠地相等,即使在相同的算法下也不会总是相等.例如,

The only numbers that can be represented exactly in R’s numeric type are integers and fractions whose denominator is a power of 2. All other numbers are internally rounded to (typically) 53 binary digits accuracy. As a result, two floating point numbers will not reliably be equal unless they have been computed by the same algorithm, and not always even then. For example,

R> a <- sqrt(2)
R> a * a == 2
[1] FALSE
R> a * a - 2
[1] 4.440892e-16
R> print(a * a, digits = 18)
[1] 2.00000000000000044

(还有其他示例,如果好奇的话,我鼓励您阅读该FAQ主题中的更多内容.)

(Other examples exist, if curious I encourage you to read more in that FAQ topic.)

因此,我建议您确定所需的精度,然后在寻找唯一性时准确使用这些数字.使用您的号码,您可以使用format(和sprintf)强制问题:

Because of this, I suggest you decide on a required precision, then use exactly those digits when looking for uniqueness. Using your numbers, you can force the issue with format (and sprintf):

a <- c(3.099331946117620972814,
       3.099331946117621860992)

table(format(a, digits = 15))
# 3.09933194611762 
#                2 
table(format(a, digits = 16))
# 3.099331946117621 3.099331946117622 
#                 1                 1 

unique(format(a, digits = 15))
# [1] "3.09933194611762"
unique(format(a, digits = 16))
# [1] "3.099331946117621" "3.099331946117622"


出于好奇,uniquetable不同的原因是table使用factor的某个地方,而factor反过来又使用as.character(y).如果您执行as.character(a),则会将精度任意降低到14位:


For the curious, the reason unique and table are different is rooted somewhere in table's use of factor, which in turn uses as.character(y). If you do as.character(a), it is arbitrarily cutting the precision to 14 digits:

as.character(a)
# [1] "3.09933194611762" "3.09933194611762"

因此,要回答您提出的问题:uniquetable是不同的,因为table最终使用了as.character,默认情况下,这里会截断为14位数字. (由于它是原始的,因此您必须进入低级源才能找出其中的一个.)

So to answer the question you asked: unique and table are different because table ultimately uses as.character, which by default truncates to 14 digits here. (Since it's a primitive, you'll have to go into the low-level source to figure that one out.)

我在上面回答的问题是一个基本的假设,即在浮点上使用unique是一件好事(我认为不是").

The question I answered above is to the underlying assumption that using unique on floating-point is a good thing to do (which I argue "it is not").

这篇关于R唯一命令和表命令不一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆