使用列表以有效的方式在 C++ 中使用 RCPP 返回一堆矩阵 [英] Returning bunch of matrices using RCPP in C++ in an efficient way using a list

查看:62
本文介绍了使用列表以有效的方式在 C++ 中使用 RCPP 返回一堆矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 RCPP 返回一堆矩阵.我下面的代码效率极低.我想知道以下代码是否有效.

I am trying to return a bunch of matrices using RCPP. My code below is extremely inefficient. I would like to know if the following code can be efficient.

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
Rcpp::List hello( 
    const arma::rowvec& g,
    const int& n, 
    const int& p,
    const arma::mat& S,
    const arma::mat& zc,
    const arma::rowvec& dl){
  Rcpp::List ht(n);

  for(int t=0; t < n;++t){

    arma::mat hhat(p,n);
    hhat.fill(0.0);
    for(int i = 0;i < n; ++i){
      arma::mat h(p,1);
      h.fill(0.0);
      if (t > i){
        for(int u=i;u <= t; ++u){
          arma::rowvec zr = zc.rows(i,i);
          h += exp(arma::as_scalar(g*zr.t())) * (zr.t() - S.cols(u,u))*dl(u);
        }
      }
      hhat.cols(i,i) = h;
    }
    ht[t] = hhat;
  }

  // Specify list length
  Rcpp::List res(1);
  res[0] = ht;

  return(res);
}

这是示例.

g=c(1,2.1,3.1)
n=1600
p=3
S = matrix(rnorm(4800),nrow=3,ncol=1600)
dl=runif(1600)
z=matrix(runif(4800),nrow=1600,ncol=3)
ptm=proc.time();kkk= hello(g=g,n=n,p=p,S = S,zc=z,dl = dl);proc.time()-ptm;
 user  system elapsed 
  31.25    0.00   31.30 

任何帮助将不胜感激.

遵循更新的代码.最初我正在返回一个列表的列表.现在它返回一个列表.这将计算时间减少了 10 秒.我希望这段代码可以进一步改进.

Following the updated code. Initially I was returning list of a list. Now it returns a list. This reduces the computing time by 10 seconds. I hope this code can be improved further.

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
Rcpp::List hello( 
    const arma::rowvec& g,
    const int& n, 
    const int& p,
    const arma::mat& S,
    const arma::mat& zc,
    const arma::rowvec& dl){
  Rcpp::List ht(n);


  for(int t=0; t < n;++t){

    arma::mat hhat(p,n);
    hhat.zeros();
    for(int i = 0;i < n; ++i){
      arma::mat h(p,1);
      // h.fill(0.0);
      h.zeros();
      if (t > i){
        for(int u=i;u <= t; ++u){
          //arma::rowvec zr = zc.rows(i,i);
          h += exp(arma::as_scalar(g*zc.row(i).t())) * (zc.row(i).t() - S.col(u))*dl(u);
        }
      }
      hhat.col(i) = h;
    }
    ht[t] = hhat;
  }

  // Specify list length
  // Rcpp::List res(1);
  // res[0] = ht;

  return(ht);
}

我试图实现的公式如下.

The formula that I am trying to implement is given below.

推荐答案

在我的另一个答案中,我研究了返回数据的效率和简单的优化.在这里我想看看不同的东西:算法的优化.

In my other answer I looked at the efficiency of returning data and at simple optimizations. Here I want to look at something different: Optimization of the algorithm.

你想计算 hhat(i, t) for 0 <= i, t <ni .看看你的公式,我们看到 hhatit 的依赖是非常不同的.特别是,hhat(i, t + 1) 可以写成 hhat(i, t) + something.现在你的外循环结束了 t 并且你正在重新计算所有这些中间值.通过切换循环顺序,每个这样的计算很容易只执行一次,将算法降低到两个嵌套循环.这意味着您必须单独生成结果矩阵.由于您不能在 Rcpp::List 中存储 arma::mat,我需要一个额外的 std::vector 用于存储:

You want to compute hhat(i, t) for 0 <= i, t < n and i < t. Looking at your formula we see that the dependency of hhat on i and t is very different. In particular, hhat(i, t + 1) can be written as hhat(i, t) + something. Right now your outer loop is over t and you are recomputing all these intermediate values. By switching the loop order, it is easy to do each such computation only once, bringing the algorithm down to a two nested loops. This means you have to generate the resulting matrices separately. And since you cannot store an arma::mat inside a Rcpp::List, I need an additional std::vector for storage:

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>

// [[Rcpp::export]]
Rcpp::List hello( 
        const arma::rowvec& g,
        const int& n, 
        const int& p,
        const arma::mat& S,
        const arma::mat& zc,
        const arma::rowvec& dl){

    std::vector<arma::mat> foo(n);
    for(int t=0; t < n;++t){
        arma::mat hhat(p,n);
        hhat.zeros();
        foo[t] = hhat;
    }

    for(int i = 0;i < n; ++i){
        arma::mat h = exp(arma::as_scalar(g*zc.row(i).t())) * (zc.row(i).t() - S.col(i))*dl(i);
        for(int t=i+1; t < n;++t){
            h += exp(arma::as_scalar(g*zc.row(i).t())) * (zc.row(i).t() - S.col(t))*dl(t);
            foo[t].col(i) = h;
        }
    }

    Rcpp::List ht(n);
    for(int t=0; t < n;++t){
        ht[t] = foo[t];
    }

    return(ht);
}

// [[Rcpp::export]]
Rcpp::List hello_orig( 
        const arma::rowvec& g,
        const int& n, 
        const int& p,
        const arma::mat& S,
        const arma::mat& zc,
        const arma::rowvec& dl){
    Rcpp::List ht(n);


    for(int t=0; t < n;++t){

        arma::mat hhat(p,n);
        hhat.zeros();
        for(int i = 0;i < n; ++i){
            arma::mat h(p,1);
            h.zeros();
            if (t > i){
                for(int u=i;u <= t; ++u){
                    h += exp(arma::as_scalar(g*zc.row(i).t())) * (zc.row(i).t() - S.col(u))*dl(u);
                }
            }
            hhat.col(i) = h;
        }
        ht[t] = hhat;
    }

    return(ht);
}

/***R
g=c(1,2.1,3.1)
n=1600
p=3
S = matrix(rnorm(p*n),nrow=p,ncol=n)
dl=runif(n)
z=matrix(runif(p*n),nrow=n,ncol=p)
bench::mark(hello_orig(g=g,n=n,p=p,S = S,zc=z,dl = dl),
            hello(g=g,n=n,p=p,S = S,zc=z,dl = dl))
*/

结果:

# A tibble: 2 x 13
  expression                                                 min median `itr/sec` mem_alloc
  <bch:expr>                                              <bch:> <bch:>     <dbl> <bch:byt>
1 hello_orig(g = g, n = n, p = p, S = S, zc = z, dl = dl)  14.2s  14.2s    0.0703    58.7MB
2 hello(g = g, n = n, p = p, S = S, zc = z, dl = dl)      53.9ms 85.9ms   11.1       58.7MB
# … with 8 more variables: `gc/sec` <dbl>, n_itr <int>, n_gc <dbl>, total_time <bch:tm>,
#   result <list>, memory <list>, time <list>, gc <list>

快了 100 倍以上!

More than a factor 100 faster!

您可以通过在评论中引用@coatless 的建议来使用 arma::cube,从而使代码更简洁(甚至可能更快一些).不过,最紧凑的形式将为您提供不同的回报结构.你会得到一个 p x n x n 数组,而不是 p x n 的列表:

You can get cleaner (and maybe even a bit faster code) by floowing @coatless' suggestions in the comments to use an arma::cube. The most compact form will give you a different return structure, though. Instead of a list of p x n you will get a p x n x n array:

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>

// [[Rcpp::export]]
arma::cube coatless( 
        const arma::rowvec& g,
        const int& n, 
        const int& p,
        const arma::mat& S,
        const arma::mat& zc,
        const arma::rowvec& dl){

    arma::cube ht(p, n, n);
    ht.zeros();

    for(int i = 0;i < n; ++i){
        arma::mat h = exp(arma::as_scalar(g*zc.row(i).t())) * (zc.row(i).t() - S.col(i))*dl(i);
        for(int t=i+1; t < n;++t){
            h += exp(arma::as_scalar(g*zc.row(i).t())) * (zc.row(i).t() - S.col(t))*dl(t);
            ht.slice(t).col(i) = h;
        }
    }

    return(ht);
}

这篇关于使用列表以有效的方式在 C++ 中使用 RCPP 返回一堆矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆