通过 ARMA_64BIT_WORD 定义的 RcppArmadillo 中的大型矩阵 [英] Large Matrices in RcppArmadillo via the ARMA_64BIT_WORD define

查看:78
本文介绍了通过 ARMA_64BIT_WORD 定义的 RcppArmadillo 中的大型矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从之前的帖子 带有 RcppArmadillo 的大型 SpMat 对象,我决定使用Rcpp 计算一个大矩阵(~600,000 行 x 11 列)

From a previous post, Large SpMat object with RcppArmadillo, I decided to use Rcpp to compute a large matrix (~600,000 rows x 11 cols)

我已经安装了 RcppRcppArmadillo

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X 10.11.6 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RcppArmadillo_0.7.500.0.0 Rcpp_0.12.7               cluster_2.0.4             skmeans_0.2-8            
 [5] ggdendro_0.1-20           ggplot2_2.1.0             lsa_0.73.1                SnowballC_0.5.1          
 [9] data.table_1.9.6          jsonlite_1.1              purrr_0.2.2               stringi_1.1.2            
[13] dplyr_0.5.0               plyr_1.8.4 

loaded via a namespace (and not attached):
 [1] assertthat_0.1   slam_0.1-38      MASS_7.3-45      chron_2.3-47     grid_3.3.1       R6_2.2.0         gtable_0.2.0    
 [8] DBI_0.5-1        magrittr_1.5     scales_0.4.0     tools_3.3.1      munsell_0.4.3    clue_0.3-51      colorspace_1.2-7
[15] tibble_1.2 

使用诸如 mtcars 之类的示例,这很完美:

With an example such as mtcars this works perfect:

library(lsa)    
x <- as.matrix(mtcars)
cosine(t(x))

这是来自 lsacosine 函数:

This is cosine function from lsa:

cosR <- function(x) {
      co <- array(0, c(ncol(x), ncol(x)))
      ## f <- colnames(x)
      ## dimnames(co) <- list(f, f)
      for (i in 2:ncol(x)) {
        for (j in 1:(i - 1)) {
            co[i,j] <- crossprod(x[,i], x[,j])/
                sqrt(crossprod(x[,i]) * crossprod(x[,j]))
        }
    }
    co <- co + t(co)
    diag(co) <- 1
    return(as.matrix(co))
}

Rcpp 中的等价物是这样的:

And the equivalent in Rcpp is this:

library(Rcpp)
library(RcppArmadillo)
cppFunction(depends='RcppArmadillo',
            code="NumericMatrix cosCpp(NumericMatrix Xr) {
            int n = Xr.nrow(), k = Xr.ncol();
            arma::mat X(Xr.begin(), n, k, false); // reuses memory and avoids extra copy
            arma::mat Y = arma::trans(X) * X; // matrix product
            arma::mat res = Y / (arma::sqrt(arma::diagvec(Y)) * arma::trans(arma::sqrt(arma::diagvec(Y))));
            return Rcpp::wrap(res);
           }")

可以检查两个函数是否等价

You can check the two functions are equivalent

all.equal(cosCpp(x),cosR(x))
[1] TRUE

但是当我在加载 Rcpp 后用我的数据运行它时,我得到:

But when I run this with my data after loading Rcpp I obtain:

x <- as.matrix(my_data)
x <- t(my_data)
y <- cosCpp(x)
error: Mat::init(): requested size is too large
Error in eval(substitute(expr), envir, enclos) : 
  Mat::init(): requested size is too large

更新 @Coatless 建议后的解决方案 + @gvegayon发帖 + 阅读数小时

Update Solution after @Coatless's suggestion + @gvegayon post + hours of reading

我修改了我的函数:

sourceCpp("/myfolder/my_function.cpp")

my_function.cpp的内容是

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;

// [[Rcpp::export]]
arma::sp_mat cosine_rcpp(
    const arma::mat & X
) {

  int k = X.n_cols;

  arma::sp_mat ans(k,k);

  for (int i=0;i<k;i++)
    for (int j=i;j<k;j++) {
      // X(i) x X(j)' / sqrt(sum(X^2) * sum(Y^2))
      ans.at(i,j) = arma::norm_dot(X.col(i), X.col(j));

    }

    return ans;
}

然后我跑

cosine_rcpp(x)

推荐答案

    由于 /src 目录中的内容,
  1. RcppArmadillo 是一个 Rcpp-only 包.要启用 C++11,请使用 //[[Rcpp::plugins(cpp11)]]
  2. ARMA_64BIT_WORD 未定义.要定义它,请在 #include 之前添加 #define ARMA_64BIT_WORD 1.
  1. RcppArmadillo is an Rcpp-only package due to the contents in the /src directory. To enable C++11 use // [[Rcpp::plugins(cpp11)]]
  2. ARMA_64BIT_WORD is not defined. To define it add #define ARMA_64BIT_WORD 1 prior to the #include <RcppArmadillo.h>.

使用sourceCpp()

#define ARMA_64BIT_WORD 1
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::plugins(cpp11)]] 

// [[Rcpp::export]] 
arma::mat cosCpp(const arma::mat& X) {

    arma::mat Y = arma::trans(X) * X; // matrix product
    arma::mat res = Y / (arma::sqrt(arma::diagvec(Y)) * arma::trans(arma::sqrt(arma::diagvec(Y))));

    return res;
}

要在 /src/Makevars{.win} 中定义它以供包使用:

To define it within the /src/Makevars{.win} for a package use:

PKG_CPPFLAGS = -DARMA_64BIT_WORD=1

这篇关于通过 ARMA_64BIT_WORD 定义的 RcppArmadillo 中的大型矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆