加快WMA(加权移动平均线)计算 [英] Speed up WMA (Weighted Moving Average) calculation

查看:169
本文介绍了加快WMA(加权移动平均线)计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试计算15天柱线的指数移动平均线,但想查看15天柱线EMA在每一天(结束)时的演变".因此,这意味着我有15天的时间.每天都有新数据输入时,我想使用新信息重新计算EMA.实际上,我有15天的柱线,然后,每天以后我的新15天柱线开始增长,并且随之而来的每个新柱线都应该与之前的全部15天柱线一起用于EMA计算.

I am trying to calculate exponential moving average on 15 day bars, but want to see "evolution" of the 15 day bar EMA on each (end of) day/bar. So, this means that I have 15 day bars. When new data comes in on a daily basis I would like to recalculate EMA using new information. Actually I have 15 day bars and then, after each day my new 15 day bar starts to grow and each new bar that comes along is supposed to be used for EMA calculation together with previous full 15 day bars.

让我们说我们从2012年1月1日开始(在这个示例中,我们有每个日历天的数据),在2012年1月15日末,我们拥有第一个完整的15天栏.在2012年3月1日完成4个完整的15天柱线后,我们可以开始计算4个柱线EMA(EMA(x,n = 4)).在2012-03-02结束时,我们将使用到目前为止的信息,并假设2012-03-02的OHLC是进行中的15天,则计算2012-03-02的EMA.因此,我们采用4条完整的柱线和2012-03-02年的柱线,并计算EMA(x,n = 4).然后,我们等待另一天,查看正在进行的新的15天栏的发生情况(有关详细信息,请参见下面的to.period.cumulative函数),并计算EMA的新值...因此,接下来的15天...请参阅函数EMA.cumulative在下面了解详情...

Lets say we start at 2012-01-01 (we have data for each calender day for this example), at the end of 2012-01-15 we have the first complete 15 day bar. After 4 completed full 15 day bars on 2012-03-01 we can start calculating 4 bar EMA (EMA(x, n=4)). On the end of 2012-03-02 we use information we have until this moment and calculate EMA on 2012-03-02 pretending that OHLC for 2012-03-02 is the 15 day bar in progress. So we take the 4 complete bars and the bar on 2012-03-02 and calculate EMA(x, n=4). We then wait another day, see what happened with the new 15 day bar in progress (see function to.period.cumulative below for details) and calculate new value for EMA... And so for the next 15 days onwards... See function EMA.cumulative below for details...

请在下面找到到目前为止我能提出的建议.对于我来说,性能是不可接受的,并且由于我有限的R知识,我无法使其更快.

Below please find what I was able to come up with until now. The performance is not acceptable for me and I can not make it any faster with my limited R knowledge.

library(quantmod)

do.call.rbind <- function(lst) {
    while(length(lst) > 1) {
        idxlst <- seq(from=1, to=length(lst), by=2)

        lst <- lapply(idxlst, function(i) {
                    if(i==length(lst)) { return(lst[[i]]) }

                    return(rbind(lst[[i]], lst[[i+1]]))
                })
    }
    lst[[1]]
}

to.period.cumulative <- function(x, name=NULL, period="days", numPeriods=15) {
    if(is.null(name))
        name <- deparse(substitute(x))

    cnames <- c("Open", "High", "Low", "Close")
    if (has.Vo(x)) 
        cnames <- c(cnames, "Volume")

    cnames <- paste(name, cnames, sep=".") 

    if (quantmod:::is.OHLCV(x)) {
        x <- OHLCV(x)
        out <- do.call.rbind( 
                lapply(split(x, f=period, k=numPeriods), 
                        function(x) cbind(rep(first(x[,1]), NROW(x[,1])), 
                                cummax(x[,2]), cummin(x[,3]), x[,4], cumsum(x[,5]))))
    } else if (quantmod:::is.OHLC(x)) {
        x <- OHLC(x)
        out <- do.call.rbind( 
                lapply(split(x, f=period, k=numPeriods), 
                        function(x) cbind(rep(first(x[,1]), NROW(x[,1])), 
                                cummax(x[,2]), cummin(x[,3]), x[,4])))
    } else {
        stop("Object does not have OHLC(V).")
    }

    colnames(out) <- cnames

    return(out)
}

EMA.cumulative<-function(cumulativeBars, nEMA = 4, period="days", numPeriods=15) {
    barsEndptCl <- Cl(cumulativeBars[endpoints(cumulativeBars, on=period,     k=numPeriods)])

    # TODO: This is sloooooooooooooooooow... 
    outEMA <- do.call.rbind(
            lapply(split(Cl(cumulativeBars), period), 
                    function(x) {
                        previousFullBars <- barsEndptCl[index(barsEndptCl) < last(index(x)), ]
                        if (NROW(previousFullBars) >= (nEMA - 1)) {
                                last(EMA(last(rbind(previousFullBars, x), n=(nEMA + 1)), n=nEMA))
                        } else {
                            xts(NA, order.by=index(x))
                        }
                    }))

    colnames(outEMA) <- paste("EMA", nEMA, sep="")

    return(outEMA)
}

getSymbols("SPY", from="2010-01-01")

SPY.cumulative <- to.period.cumulative(SPY, , name="SPY")

system.time(
        SPY.EMA <- EMA.cumulative(SPY.cumulative)
)

在我的系统上,需要

   user  system elapsed 
  4.708   0.000   4.410 

可接受的执行时间将少于一秒...是否可以使用纯R来实现?

Acceptable execution time would be less than one second... Is it possible to achieve this using pure R?

此帖子链接到优化移动平均值计算-是否可以?我没有收到任何答案.现在,我可以创建一个可重现的示例,其中对我想加快的速度进行了更详细的说明.我希望这个问题现在更有意义.

This post is linked to Optimize moving averages calculation - is it possible? where I received no answers. I was now able to create a reproducible example with more detailed explanation of what I want to speed up. I hope the question makes more sense now.

任何关于如何加快速度的想法都受到高度赞赏.

Any ideas on how to speed this up are highly appreciated.

推荐答案

对于使用R的问题,我没有找到满意的解决方案.因此,我使用了旧工具c语言,结果比我预期的要好. .感谢您使用Rcpp,内联等强大工具推动"我.我猜想,每当我将来对性能有要求并且无法使用R满足时,我都会在R中添加C并获得性能.因此,请在下面查看我的代码和性能问题的解决方法.

I have not find a satisfactory solution for my question using R. So I took the old tool, c language, and results are better than I would have ever expected. Thanks for "pushing" me using this great tools of Rcpp, inline etc. Amazing. I guess, whenever I have performance requirements in the future and can not be met using R I will add C to R and performance is there. So, please see below my code and resolution of the performance issues.

# How to speedup cumulative EMA calculation
# 
###############################################################################

library(quantmod)
library(Rcpp)
library(inline)
library(rbenchmark)

do.call.rbind <- function(lst) {
    while(length(lst) > 1) {
        idxlst <- seq(from=1, to=length(lst), by=2)

        lst <- lapply(idxlst, function(i) {
                    if(i==length(lst)) { return(lst[[i]]) }

                    return(rbind(lst[[i]], lst[[i+1]]))
                })
    }
    lst[[1]]
}

to.period.cumulative <- function(x, name=NULL, period="days", numPeriods=15) {
    if(is.null(name))
        name <- deparse(substitute(x))

    cnames <- c("Open", "High", "Low", "Close")
    if (has.Vo(x)) 
        cnames <- c(cnames, "Volume")

    cnames <- paste(name, cnames, sep=".") 

    if (quantmod:::is.OHLCV(x)) {
        x <- quantmod:::OHLCV(x)
        out <- do.call.rbind( 
                lapply(split(x, f=period, k=numPeriods), 
                        function(x) cbind(rep(first(x[,1]), NROW(x[,1])), 
                                cummax(x[,2]), cummin(x[,3]), x[,4], cumsum(x[,5]))))
    } else if (quantmod:::is.OHLC(x)) {
        x <- OHLC(x)
        out <- do.call.rbind( 
                lapply(split(x, f=period, k=numPeriods), 
                        function(x) cbind(rep(first(x[,1]), NROW(x[,1])), 
                                cummax(x[,2]), cummin(x[,3]), x[,4])))
    } else {
        stop("Object does not have OHLC(V).")
    }

    colnames(out) <- cnames

    return(out)
}

EMA.cumulative<-function(cumulativeBars, nEMA = 4, period="days", numPeriods=15) {
    barsEndptCl <- Cl(cumulativeBars[endpoints(cumulativeBars, on=period, k=numPeriods)])

    # TODO: This is sloooooooooooooooooow... 
    outEMA <- do.call.rbind(
            lapply(split(Cl(cumulativeBars), period), 
                    function(x) {
                        previousFullBars <- barsEndptCl[index(barsEndptCl) < last(index(x)), ]
                        if (NROW(previousFullBars) >= (nEMA - 1)) {
                                last(EMA(last(rbind(previousFullBars, x), n=(nEMA + 1)), n=nEMA))
                        } else {
                            xts(NA, order.by=index(x))
                        }
                    }))

    colnames(outEMA) <- paste("EMA", nEMA, sep="")

    return(outEMA)
}

EMA.c.c.code <- '
    /* Initalize loop and PROTECT counters */
    int i, P=0;

    /* ensure that cumbars and fullbarsrep is double */
    if(TYPEOF(cumbars) != REALSXP) {
      PROTECT(cumbars = coerceVector(cumbars, REALSXP)); P++;
    }

    /* Pointers to function arguments */
    double *d_cumbars = REAL(cumbars);
    int i_nper = asInteger(nperiod);
    int i_n = asInteger(n);
    double d_ratio = asReal(ratio);

    /* Input object length */
    int nr = nrows(cumbars);

    /* Initalize result R object */
    SEXP result;
    PROTECT(result = allocVector(REALSXP,nr)); P++;
    double *d_result = REAL(result);

    /* Find first non-NA input value */
    int beg = i_n*i_nper - 1;
    d_result[beg] = 0;
    for(i = 0; i <= beg; i++) {
        /* Account for leading NAs in input */
        if(ISNA(d_cumbars[i])) {
            d_result[i] = NA_REAL;
            beg++;
            d_result[beg] = 0;
            continue;
        }
        /* Set leading NAs in output */
        if(i < beg) {
            d_result[i] = NA_REAL;
        }
        /* Raw mean to start EMA - but only on full bars*/
        if ((i != 0) && (i%i_nper == (i_nper - 1))) {
            d_result[beg] += d_cumbars[i] / i_n;
        }
    }

    /* Loop over non-NA input values */
    int i_lookback = 0;
    for(i = beg+1; i < nr; i++) {
        i_lookback = i%i_nper;

        if (i_lookback == 0) {
            i_lookback = 1;
        } 
        /*Previous result should be based only on full bars*/
        d_result[i] = d_cumbars[i] * d_ratio + d_result[i-i_lookback] * (1-d_ratio);
    }

    /* UNPROTECT R objects and return result */
    UNPROTECT(P);
    return(result);
'

EMA.c.c <- cfunction(signature(cumbars="numeric", nperiod="numeric", n="numeric",     ratio="numeric"), EMA.c.c.code)

EMA.cumulative.c<-function(cumulativeBars, nEMA = 4, period="days", numPeriods=15) {
    ratio <- 2/(nEMA+1)

    outEMA <- EMA.c.c(cumbars=Cl(cumulativeBars), nperiod=numPeriods, n=nEMA, ratio=ratio)  

    outEMA <- reclass(outEMA, Cl(cumulativeBars))

    colnames(outEMA) <- paste("EMA", nEMA, sep="")

    return(outEMA)
}

getSymbols("SPY", from="2010-01-01")

SPY.cumulative <- to.period.cumulative(SPY, name="SPY")

system.time(
        SPY.EMA <- EMA.cumulative(SPY.cumulative)
)

system.time(
        SPY.EMA.c <- EMA.cumulative.c(SPY.cumulative)
)


res <- benchmark(EMA.cumulative(SPY.cumulative), EMA.cumulative.c(SPY.cumulative),
        columns=c("test", "replications", "elapsed", "relative", "user.self", "sys.self"),
        order="relative",
        replications=10)

print(res)

为了表明我的工作量有所增加(我确信它可以做得更好,因为实际上我已经创建了double for循环),R在这里是打印出来的:

To give an indication of performance improvement over my cumbersome (I am sure it can be made better, since in effect I have created double for loop) R here is a print out:

> print(res)
                              test replications elapsed relative user.self
2 EMA.cumulative.c(SPY.cumulative)           10   0.026    1.000     0.024
1   EMA.cumulative(SPY.cumulative)           10  57.732 2220.462    56.755

所以,按照我的标准,这是SF类型的改进...

So, by my standards, a SF type of improvement...

这篇关于加快WMA(加权移动平均线)计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆