R:如何为 xts 对象跨行应用? [英] R: how to vapply across rows for xts object?

查看:15
本文介绍了R:如何为 xts 对象跨行应用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下 xts 对象.

I have the following xts object.

x <- structure(c(30440.5, 30441, 30441.5, 30441.5, 30441, 30439.5, 30440.5, 30441,
                 30441.5, NA, NA, 30439.5, NA, NA, NA, 30441.5, 30441, NA), .indexTZ = "",
               class = c("xts", "zoo"), .indexCLASS = c("POSIXct", "POSIXt"), 
               tclass = c("POSIXct", "POSIXt"), tzone = "", 
               index = structure(c(1519866931.1185, 1519866931.1255, 1519866931.1255, 
                                   1519866931.1905, 1519866931.1905, 1519866931.1915), 
                                 tzone = "", tclass = c("POSIXct", "POSIXt")), 
               .indexFormat = "%Y-%m-%d %H:%M:%OS",
               .Dim = c(6L, 3L), .Dimnames = list(NULL, c("x", "y", "z")))
#                              x        y        z
# 2018-03-01 09:15:31.118  30440.5  30440.5       NA
# 2018-03-01 09:15:31.125  30441.0  30441.0       NA
# 2018-03-01 09:15:31.125  30441.5  30441.5       NA
# 2018-03-01 09:15:31.190  30441.5       NA  30441.5
# 2018-03-01 09:15:31.190  30441.0       NA  30441.0
# 2018-03-01 09:15:31.191  30439.5  30439.5       NA

我如何编写 vapply 以使用 mean(..., na.rm = TRUE) 获取跨行的平均值,以便它返回一个单列,如这是?

How can I write the vapply to obtain the mean across rows with mean(..., na.rm = TRUE) such that it returns a single column like this?

                               w       
2018-03-01 09:15:31.118  30440.5
2018-03-01 09:15:31.125  30441.0 
2018-03-01 09:15:31.125  30441.5
2018-03-01 09:15:31.190  30441.5 
2018-03-01 09:15:31.190  30441.0 
2018-03-01 09:15:31.191  30439.5

我就是无法让它工作.

我注意到很多答案都建议我不要使用 vapply 而是使用其他功能.然而,根据这个answervapply 实际上是最快的.那么哪个 apply 函数在这里最好?

I am noticing that a lot of answers recommend me not to use vapply and use other functions instead. However, according to this answer, vapply is actually the fastest. So which apply function is the best here ?

推荐答案

如果您想要每行的列的平均值,我不会使用 vapply.我会使用 rowMeans,并注意你必须将结果转换回 xts.

I would not use vapply if you want the mean of the columns for each row. I would use rowMeans, and note that you have to convert the result back to xts.

(xmean <- xts(rowMeans(x, na.rm = TRUE), index(x)))
#                        [,1]
# 2018-02-28 19:15:31 30440.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30439.5

而且我会使用 apply 作为没有专门实现的通用函数.请注意,如果函数返回多个值,您将需要转置结果.

And I would use apply for a generic function that doesn't have a specialized implementation. Note that you will need to transpose the result if the function returns more than one value.

(xmin <- as.xts(apply(x, 1, min, na.rm = TRUE), dateFormat = "POSIXct"))
#                        [,1]
# 2018-02-28 19:15:31 30440.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30439.5
(xrange <- as.xts(t(apply(x, 1, range, na.rm = TRUE)), dateFormat = "POSIXct"))
#                        [,1]    [,2]
# 2018-02-28 19:15:31 30440.5 30440.5
# 2018-02-28 19:15:31 30441.0 30441.0
# 2018-02-28 19:15:31 30441.5 30441.5
# 2018-02-28 19:15:31 30441.5 30441.5
# 2018-02-28 19:15:31 30441.0 30441.0
# 2018-02-28 19:15:31 30439.5 30439.5

为了解决为什么不使用 vapply()"的评论,这里有一些基准(使用来自 OP 链接到的代码审查 Q/A 的数据):

To address the comment of "why not use vapply()", here are some benchmarks (using the data from the code review Q/A the OP linked to):

set.seed(21)
xz <- xts(replicate(6, sample(c(1:100), 1000, rep = TRUE)),
          order.by = Sys.Date() + 1:1000)
xrowmean <- function(x) { xts(rowMeans(x, na.rm = TRUE), index(x)) }
xapply <- function(x) { as.xts(apply(x, 1, mean, na.rm = TRUE), dateFormat = "POSIXct") }
xvapply <- function(x) { xts(vapply(seq_len(nrow(x)), function(i) {
    mean(x[i,], na.rm = TRUE) }, FUN.VALUE = numeric(1)), index(x)) }

library(microbenchmark)
microbenchmark(xrowmean(xz), xapply(xz), xvapply(xz))
# Unit: microseconds
#          expr       min         lq       mean     median         uq       max neval
#  xrowmean(xz)   169.496   188.8505   207.1931   204.2455   219.4945   285.329   100
#    xapply(xz) 33477.542 34203.3260 35698.0503 35076.4655 36821.1320 43910.353   100
#   xvapply(xz) 32709.238 35010.1920 37514.7557 35884.3585 37972.7085 84409.961   100

那么,为什么不使用 vapply() 呢?它不会增加太多的性能优势.它比 apply() 版本要冗长得多,如果您可以控制对象的类型和被调用的函数.也就是说,使用 vapply() 不会造成任何伤害.对于这种情况,我更喜欢 apply().

So, why not use vapply()? It doesn't add much in the way of performance benefit. It's quite a bit more verbose than the apply() version, and it's not clear there's much benefit to the safety of the 'pre-specified return value' if you have control over the type of object and the function being called. That said, you're not going to do any harm by using vapply(). I simply prefer apply() for this case.

这篇关于R:如何为 xts 对象跨行应用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆