小数点-语言R中的概率值为0 [英] Decimal points - Probability value of 0 in Language R
问题描述
如何处理R中的p值?
我期望p值非常低,例如:
I am expecting very low p values like:
1.00E-80
我需要-log10
-log10(1.00E-80)
-log10(0)也是Inf,但在舍入意义上也是Inf。
-log10(0) is Inf, but Inf at sense of rounding too.
但似乎在1.00E-308之后,R产生0。
But is seems that after 1.00E-308, R yields 0.
1/10^308
[1] 1e-308
1/10^309
[1] 0
p值显示的准确性与 lm
函数具有相同的截止点1e-308,或者它的设计目的是我们需要一个截止点,而我需要考虑一个不同的截止点-如1e-100(例如),将0替换为< 1e-100。
Is the accuracy of p-value display with lm
function the same as the cutoff point, 1e-308, or it is just designed such that we need a cutoff point and I need to consider a different cutoff point - such as 1e-100 (for example) to replace 0 with <1e-100.
推荐答案
有多种可能的答案-其中最有用的取决于上下文:
There are a variety of possible answers -- which one is most useful depends on the context:
- 在通常情况下,R确实无法存储比
接近零的浮点值。Machine $ double.xmin
,具体取决于平台,但通常(如您所见)在1e-308
的顺序上。如果您确实需要使用这么小的数字,并且找不到直接在对数刻度上工作的方法,则需要在Stack Overflow或R Wiki中搜索用于处理任意/扩展精度值的方法(但您可能应该尝试在对数刻度上工作-麻烦就少了 - 在许多情况下,R实际上会在内部(自然)对数刻度上计算p值,如果需要的话,可以返回日志值,而不是在给出答案之前对它们取幂。例如,
dnorm(-100,log = TRUE)
给出-5000.919。您可以通过除以log(10)
直接转换为log10刻度(不进行幂运算,然后使用log10
):dnorm(-100,log = TRUE)/ log(10)
=-2171,该值太小而无法以浮点数表示。对于p ***
(累积分布函数)函数,请使用log.p = TRUE
而不是log = TRUE
。 (这一点在很大程度上取决于您的特定上下文。即使您没有使用内置的R函数,您也可以找到一种以对数刻度提取结果的方法。) - 在某些情况下,即使已知更精确的值,R也会显示p值结果为
<2.2e-16
:(t1 <- t.test(rnorm(10,100),rnorm(10,80)))
- R is indeed incapable under ordinary circumstances of storing floating-point values closer to zero than
.Machine$double.xmin
, which varies by platform but is typically (as you discovered) on the order of1e-308
. If you really need to work with numbers this small and can't find a way to work on the log scale directly, you need to search Stack Overflow or the R wiki for methods for dealing with arbitrary/extended precision values (but you probably should try to work on the log scale -- it will be much less of a hassle) - in many circumstances R actually computes p values on the (natural) log scale internally, and can if requested return the log values rather than exponentiating them before giving the answer. For example,
dnorm(-100,log=TRUE)
gives -5000.919. You can convert directly to the log10 scale (without exponentiating and then usinglog10
) by dividing bylog(10)
:dnorm(-100,log=TRUE)/log(10)
=-2171, which would be too small to represent in floating point. For thep***
(cumulative distribution function) functions, uselog.p=TRUE
rather thanlog=TRUE
. (This particular point depends heavily on your particular context. Even if you are not using built-in R functions you may be able to find a way to extract results on the log scale.) - in some cases R presents p-value results as being
<2.2e-16
even when a more precise value is known:(t1 <- t.test(rnorm(10,100),rnorm(10,80)))
打印
....
t = 56.2902, df = 17.904, p-value < 2.2e-16
但是您仍然可以从结果中提取精确的p值
but you can still extract the precise p-value from the result
> t1$p.value
[1] 1.856174e-18
(在许多情况下,这是行为由 format.pval()
函数控制)
(in many cases this behaviour is controlled by the format.pval()
function)
如何使用 lm
:
d <- data.frame(x=rep(1:5,each=10))
set.seed(101)
d$y <- rnorm(50,mean=d$x,sd=0.0001)
lm1 <- lm(y~x,data=d)
摘要(lm1)
将斜率的p值打印为< 2.2e-16
,但是如果我们使用 coef(summary(lm1 ))
(不使用p值格式),我们可以看到该值为9.690173e-203。
summary(lm1)
prints the p-value of the slope as <2.2e-16
, but if we use coef(summary(lm1))
(which does not use the p-value formatting), we can see that the value is 9.690173e-203.
A更多极端情况:
set.seed(101); d$y <- rnorm(50,mean=d$x,sd=1e-7)
lm2 <- lm(y~x,data=d)
coef(summary(lm2))
表示p值实际上已经降为零。但是,我们仍然可以获得对数刻度的答案:
shows that the p-value has actually underflowed to zero. However, we can still get an answer on the log scale:
tval <- coef(summary(lm2))["x","t value"]
2*pt(abs(tval),df=48,lower.tail=FALSE,log.p=TRUE)/log(10)
给出-692.62(您可以在前面的示例中检查此方法,其中p值不会溢出,并且看到相同的值答案显示在摘要中。)
gives -692.62 (you can check this approach with the previous example where the p-value doesn't overflow and see that you get the same answer as printed in the summary).
这篇关于小数点-语言R中的概率值为0的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!