平均值(x,na.rm = TRUE)导致的结果错误 [英] Wrong result from mean(x, na.rm = TRUE)

查看:114
本文介绍了平均值(x,na.rm = TRUE)导致的结果错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要计算一系列Manager收益的平​​均值,最小值和最大值,如下所示:

I want to compute the mean, min and max of a series of Managers returns, as follows:

ManagerRet <-data.frame(diff(Managerprices)/lag(Managerprices,k=-1))

然后我用NaN替换return = 0,因为数据是从数据库中提取的,并且不是所有日期都被填充.

I then replace return = 0 with NaN since data are extracted from a database and not all the dates are populated.

ManagerRet = replace(ManagerRet,ManagerRet==0,NaN)

我具有以下3个功能

> min(ManagerRet,na.rm = TRUE)
[1] -0.0091716

> max(ManagerRet,na.rm = TRUE)
[1] 0.007565

> mean(ManagerRet,na.rm = TRUE)*252
[1] NaN

为什么平均值函数在正确执行最小和最大计算时会返回NaN值?

Why the mean function returns a NaN value while min and max performe calculation properly?

下面您可以找到动物园对象MangerRet

Below you can find the zoo object MangerRet

> ManagerRet
               Manager
2011-10-04         NaN
2011-10-05         NaN
2011-10-06         NaN
2011-10-07         NaN
2011-10-11         NaN
2011-10-12         NaN
2011-10-13         NaN
2011-10-14         NaN
2011-10-17         NaN
2011-10-18         NaN
2011-10-19         NaN
2011-10-20         NaN
2011-10-21         NaN
2011-10-24         NaN
2011-10-25         NaN
2011-10-26         NaN
2011-10-27         NaN
2011-10-28         NaN
2011-10-31  6.3832e-04
2011-11-01 -4.4625e-06
2011-11-02  2.8142e-03
2011-11-03  5.1114e-04
2011-11-04 -1.0105e-03
2011-11-07  7.5650e-03
2011-11-08  2.1002e-03
2011-11-09 -9.1716e-03
2011-11-10  1.1173e-03
2011-11-14 -6.9207e-03
2011-11-15  2.6241e-04
2011-11-16  1.7520e-03
2011-11-17 -2.6443e-05
2011-11-18 -1.4169e-03
2011-11-21  3.7602e-04
2011-11-22  4.3982e-05
2011-11-23 -6.7328e-06
2011-11-25  1.1571e-05
2011-11-28  1.4016e-07
2011-11-29 -2.0426e-07

所需的其他信息

> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Italian_Italy.1252  LC_CTYPE=Italian_Italy.1252   
[3] LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C                  
[5] LC_TIME=Italian_Italy.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] gWidgetsRGtk2_0.0-81       gWidgets_0.0-52           
 [3] RGtk2_2.20.25              lattice_0.20-15           
 [5] moments_0.13               data.table_1.8.8          
 [7] tseries_0.10-30            timeDate_2160.97          
 [9] PerformanceAnalytics_1.1.0 xts_0.9-3                 
[11] zoo_1.7-9                  RODBC_1.3-6               

loaded via a namespace (and not attached):
[1] grid_2.15.2    quadprog_1.5-4

推荐答案

您应该为此使用colMeans:

colMeans(ManagerRet, na.rm=TRUE)
##       Manager 
## -6.826297e-05 

如果这是一个data.frame,则您将收到一条警告(但输出正确).

If this had been a data.frame, you would have received a warning (but correct output).

在这里,您已经暴露出不一致的方式,即data.framezoo对象被带有逻辑矩阵索引的[子集化.这似乎是[.zoo中的错误.我已经给维护者发了电子邮件.

Here, you have exposed an inconsistency in the way that a data.frame and a zoo object are subsetted with [ with a logical matrix index. This appears to be a bug in [.zoo. I have emailed the maintainer.

此问题出现在mean.default的此步骤中:

The problem occurs at this step within mean.default:

if (na.rm) 
    x <- x[!is.na(x)]

这是哪里出了问题:

ManagerRet[!is.na(ManagerRet)]
##   1 
## NaN 

!is.na(ManagerRet)看起来像预期的那样,但不是:

!is.na(ManagerRet) looks as expected, but isn't:

class(!is.na(ManagerRet))
[1] "matrix"

该类在[.zoo中是意外的.这些行是存在的:

This class is unexpected in [.zoo. These lines are present:

if (all(class(i) == "logical")) 
    i <- which(rep(i, length.out = n2))
else if (inherits(i, "zoo") && all(class(coredata(i)) == 
    "logical")) {
    i <- which(coredata(merge(zoo(, time(x)), i)))
}
else if (!((all(class(i) == "numeric") || all(class(i) == 
    "integer")))) 
    i <- which(MATCH(index(x), i, nomatch = 0L) > 0L)

在这种情况下,实际上最后一行是在运行,从而产生不正确的结果.

The last line here is actually run in this case, producing incorrect results.

结构:

> dput(ManagerRet)
structure(c(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 0.00063832, -4.4625e-06, 
0.0028142, 0.00051114, -0.0010105, 0.007565, 0.0021002, -0.0091716, 
0.0011173, -0.0069207, 0.00026241, 0.001752, -2.6443e-05, -0.0014169, 
0.00037602, 4.3982e-05, -6.7328e-06, 1.1571e-05, 1.4016e-07, 
-2.0426e-07), .Dim = c(38L, 1L), .Dimnames = list(c("2011-10-04", 
"2011-10-05", "2011-10-06", "2011-10-07", "2011-10-11", "2011-10-12", 
"2011-10-13", "2011-10-14", "2011-10-17", "2011-10-18", "2011-10-19", 
"2011-10-20", "2011-10-21", "2011-10-24", "2011-10-25", "2011-10-26", 
"2011-10-27", "2011-10-28", "2011-10-31", "2011-11-01", "2011-11-02", 
"2011-11-03", "2011-11-04", "2011-11-07", "2011-11-08", "2011-11-09", 
"2011-11-10", "2011-11-14", "2011-11-15", "2011-11-16", "2011-11-17", 
"2011-11-18", "2011-11-21", "2011-11-22", "2011-11-23", "2011-11-25", 
"2011-11-28", "2011-11-29"), "Manager"), index = 1:38, class = "zoo")

旧代码-colMeans是执行此操作的正确方法: 用$指定列"可以解决此问题:

old code - colMeans is the proper way to do this: Specifying the "column" with $ gets around this:

mean(ManagerRet, na.rm=TRUE)
## [1] NaN
mean(ManagerRet$Manager, na.rm=TRUE)
## [1] -6.826297e-05

这篇关于平均值(x,na.rm = TRUE)导致的结果错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆