在Rcpp中确定NumericVector和arma :: vec [英] Deciding between NumericVector and arma::vec in Rcpp

查看:404
本文介绍了在Rcpp中确定NumericVector和arma :: vec的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用RcppArmadillo,使用arma::vec从R到Rcpp的转换与使用Rcpp和NumericVector一样容易.我的项目使用RcppArmadillo.

我不确定要使用什么,NumericVectorarma::vec?两者之间的主要区别是什么?什么时候使用?使用一个相对于另一个具有性能/内存优势吗?成员功能是唯一的区别吗?而且,作为一个奖励问题:我什至应该考虑arma::colvecarma::rowvec?

解决方案

两者之间的主要区别是什么?

Rcpp 中的*Vector*Matrix类用作 R SEXP 表示形式的包装,例如 S 表达式,它是指向数据的指针.有关详细信息,请参见第1.1节SEXP R内部构件. Rcpp 的设计通过从包含指向数据的指针的类中构造 C ++ 对象来利用这一点.这带来了两个关键功能:

  1. R C ++ 对象之间的无缝转移,以及
  2. 由于仅传递了指针,因此 R C ++ 之间的
  3. 转移成本.
    • 因为数据不是复制而是引用了

同时,arma对象类似于传统的std::vector<T>,在 R C ++ deep 副本. >对象.该声明有一个例外,即高级构造函数的存在,它允许 R 对象后面的内存要在armadillo对象结构内部重新使用.因此,如果您不小心,可能会在从 R 过渡到 C ++ 的过程中遭受不必要的损失,反之亦然.

注意: 数学仓库:

要在代码中说明这种情况,请考虑以下三个功能:

#include <RcppArmadillo.h>

// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
void memory_reference_double_ex(arma::vec& x, double value) {
    x.fill(value);
}

// [[Rcpp::export]]
void memory_reference_int_ex(arma::ivec& x, int value) {
    x.fill(value);
}

// [[Rcpp::export]]
arma::vec memory_copy_ex(arma::vec x, int value) {
    x.fill(value);
    return x;
}

假定存在适当的数据类型,两个函数memory_reference_double_ex()memory_reference_int_ex()将更新 R 的对象 inside .结果,我们可以通过在其定义中指定void来避免返回值,因为由x分配的内存正在被重用.第三个功能memory_copy_ex()需要返回类型,因为它是通过副本传递的,因此无需修改就可以修改现有存储.

要强调:

  1. x向量将通过引用传递到 C ++ 中,例如&arma::vec&arma::ivec&的末尾,以及
  2. R x的类是doubleinteger,这意味着我们正在匹配

    现在,如果 R 对象的基础类型是integer而不是double,会发生什么?

    x = c(1L, 2L, 3L, 4L)
    typeof(x)
    # [1] "integer"
    
    x
    # [1] 1 2 3 4
    
    # Return nothing...
    memory_reference_double_ex(x, value = 9)
    
    x
    # [1] 1 2 3 4
    

    发生了什么事?为什么x没有更新?好吧,在幕后 Rcpp 创建了一个新的内存分配,该内存分配是正确的类型-double而不是int-在将其传递到armadillo之前.这导致两个对象之间的引用链接"不同.

    如果我们更改为在armadillo向量中使用整数数据类型,请注意,我们现在具有与先前相同的效果:

    memory_reference_int_ex(x, value = 3)
    
    x
    # [1] 3 3 3 3
    

    这导致对这两种范例的有用性进行讨论.由于 speed 是与 C ++ 一起使用时的首选基准,因此让我们从基准的角度来看它.

    请考虑以下两个功能:

    #include <RcppArmadillo.h>
    
    // [[Rcpp::depends(RcppArmadillo)]]
    
    // [[Rcpp::export]]
    void copy_double_ex(arma::vec x, double value) {
        x.fill(value);
    }
    
    // [[Rcpp::export]]
    void reference_double_ex(arma::vec& x, double value) {
        x.fill(value);
    }
    

    对其进行微基准测试会产生:

    # install.packages("microbenchmark")
    library("microbenchmark")
    
    x = rep(1, 1e8)
    
    micro_timings = microbenchmark(copy_double_ex(x, value = 9.0),
                                   reference_double_ex(x, value = 9.0))
    autoplot(micro_timings)
    micro_timings
    
    # Unit: milliseconds
    #                               expr       min        lq      mean    median        uq      max neval
    #       copy_double_ex(x, value = 9) 523.55708 529.23219 547.22669 536.71177 555.00069 640.5020   100
    #  reference_double_ex(x, value = 9)  79.78624  80.70757  88.67695  82.44711  85.73199 308.4219   100
    

    注意:引用的对象每次迭代比复制的范例快约6.509771倍,因为我们不必重新分配并填充该内存.

    何时使用哪个?

    需要做什么?

    您是否只是在试图快速加快依赖循环但不需要严格的线性代数运算的算法?

    如果是这样,只需使用 Rcpp 就足够了.

    您是否要执行线性代数运算? 还是您希望在多个库或计算平台(例如MATLAB,Python,R等)中使用此代码?

    如果是这样,您应该在 armadillo 中编写算法的关键,并设置适当的钩子,以使用 Rcpp将函数导出到 R 中. em>.

    使用一个相对于另一个具有性能/内存优势吗?

    是的,如前所述,在性能/内存上绝对有优势.不仅如此,而且通过使用 RcppArmadillo ,您可以有效地在 Rcpp 之上添加一个附加库,从而增加了总体安装空间,编译时间和系统要求(请参见macOS构建的困境).找出项目需要的 ,然后选择该结构.

    成员功能唯一不同吗?

    不仅成员函数,而且:

    • 根据矩阵分解的估计例程
    • 计算统计量值
    • 对象生成
    • 稀疏表示(避免操纵S4对象)

    这些是 Rcpp armadillo 之间的基本区别.一种用于促进将 R 对象转移到 C ++ 中,而另一种则用于更严格的线性代数计算.这很明显,因为 Rcpp 不会实现任何矩阵乘法逻辑,而armadillo使用系统的基本线性代数子程序(BLAS)进行计算.

    还有一个额外的问题:我什至应该考虑arma :: colvec还是arma :: rowvec?

    取决于您希望如何返回结果.您是否想要一个:1 x N(行向量)或N x 1(列向量)? RcppArmadillo默认情况下将这些结构作为具有适当尺寸的矩阵对象和 not 传统一维 R 向量返回.

    例如:

    #include <RcppArmadillo.h>
    
    // [[Rcpp::depends(RcppArmadillo)]]
    
    // [[Rcpp::export]]
    arma::vec col_example(int n) {
        arma::vec x = arma::randu<arma::vec>(n);
        return x;
    }
    
    
    // [[Rcpp::export]]
    arma::rowvec row_example(int n) {
        arma::rowvec x = arma::randu<arma::rowvec>(n);
        return x;
    }
    

    测试:

    set.seed(1)
    col_example(4)
    #           [,1]
    # [1,] 0.2655087
    # [2,] 0.3721239
    # [3,] 0.5728534
    # [4,] 0.9082078
    
    set.seed(1)
    row_example(4)
    #           [,1]      [,2]      [,3]      [,4]
    # [1,] 0.2655087 0.3721239 0.5728534 0.9082078
    

    With RcppArmadillo the conversion from R to Rcpp with arma::vec is just as easy as with Rcpp and NumericVector. My project utilizes RcppArmadillo.

    I'm unsure what to use, NumericVector or arma::vec? What are the key differences between those two? When to use which? Is there a performance/memory advantage of using one over the other? Are the only difference the member functions? And, as a bonus question: should I even consider arma::colvec or arma::rowvec?

    解决方案

    What are the key differences between those two?

    The *Vector and *Matrix classes in Rcpp act as wrappers for R's SEXP representation, e.g. an S expression that is as a pointer to the data. For details, please see Section 1.1 SEXPs of R Internals.Rcpp's design leverages this by constructing C++ objects from classes that enclose the pointer to the data. This promotes two key features:

    1. Seamless transference between R and C++ objects, and
    2. Low transference cost between R and C++ as only a pointer is passed.
      • as the data isn't copied but referenced

    Meanwhile, arma objects are akin to a traditional std::vector<T> in the way that a deep copy occurs between the R and C++ objects. There is one exception to this statement, the presence of the advanced constructor, which allows for the memory behind the R object to be reused inside of an armadillo object's structure. Thus, if you are not careful, you may incur an unnecessary penalty during the transition from R to C++ and vice versa.

    Note: The advanced constructor that allows the reuse of memory does not exist for arma::sp_mat. Thus, using references with sparse matrices will likely not yield the desired speed up as a copy is performed from R to C++ and back.

    You can view the differences based largely on the "pass-by-reference" or "pass-by-copy" paradigm. To understand the difference outside of code, consider the following GIF by mathwarehouse:

    To illustrate this scenario in code, consider the following three functions:

    #include <RcppArmadillo.h>
    
    // [[Rcpp::depends(RcppArmadillo)]]
    
    // [[Rcpp::export]]
    void memory_reference_double_ex(arma::vec& x, double value) {
        x.fill(value);
    }
    
    // [[Rcpp::export]]
    void memory_reference_int_ex(arma::ivec& x, int value) {
        x.fill(value);
    }
    
    // [[Rcpp::export]]
    arma::vec memory_copy_ex(arma::vec x, int value) {
        x.fill(value);
        return x;
    }
    

    The two functions memory_reference_double_ex() and memory_reference_int_ex() will update the object inside of R assuming the appropriate data type is present. As a result, we are able to avoid returning a value by specifying void in their definitions since the memory allocated by x is being reused. The third function, memory_copy_ex() requires a return type since it passes-by-copy and, thus, does not modify the existing storage without a reassignment call.

    To emphasize:

    1. The x vector will be passed into C++ by reference, e.g. & on the end of arma::vec& or arma::ivec&, and
    2. The class of x in R is either double or integer meaning we are matching the underlying type of arma::vec, e.g Col<double>, or arma::ivec, e.g. Col<int>.

    Let's quickly take a look at two examples.

    Within the first example, we will look at the results from running memory_reference_double_ex() and compare it to the results generated by memory_copy_ex(). Note, the types between the objected defined in R and C++ are the same (e.g. double). In the next example, this will not hold.

    x = c(0.1, 2.3, 4.8, 9.1)
    typeof(x)
    # [1] "double"
    
    x
    # [1] 0.1 2.3 4.8 9.1
    
    # Nothing is returned...
    memory_reference_double_ex(x, value = 9)
    
    x
    # [1] 9 9 9 9
    
    a = memory_copy_ex(x, value = 3)
    
    x
    # [1] 9 9 9 9
    
    a
    #      [,1]
    # [1,]    3
    # [2,]    3
    # [3,]    3
    # [4,]    3
    

    Now, what happens if the underlying type of the R object is an integer instead of a double?

    x = c(1L, 2L, 3L, 4L)
    typeof(x)
    # [1] "integer"
    
    x
    # [1] 1 2 3 4
    
    # Return nothing...
    memory_reference_double_ex(x, value = 9)
    
    x
    # [1] 1 2 3 4
    

    What happened? Why didn't x get updated? Well, behind the scenes Rcpp created a new memory allocation that was the proper type -- double and not int -- before passing it onto armadillo. This caused the reference "linkage" between the two objects to differ.

    If we change to using an integer data type in the armadillo vector, notice we now have the same effect given previously:

    memory_reference_int_ex(x, value = 3)
    
    x
    # [1] 3 3 3 3
    

    This leads to discussion on the usefulness of these two paradigms. As speed is the preferred benchmark when working with C++, let's view this in terms of a benchmark.

    Consider the following two functions:

    #include <RcppArmadillo.h>
    
    // [[Rcpp::depends(RcppArmadillo)]]
    
    // [[Rcpp::export]]
    void copy_double_ex(arma::vec x, double value) {
        x.fill(value);
    }
    
    // [[Rcpp::export]]
    void reference_double_ex(arma::vec& x, double value) {
        x.fill(value);
    }
    

    Running a microbenchmark over them yields:

    # install.packages("microbenchmark")
    library("microbenchmark")
    
    x = rep(1, 1e8)
    
    micro_timings = microbenchmark(copy_double_ex(x, value = 9.0),
                                   reference_double_ex(x, value = 9.0))
    autoplot(micro_timings)
    micro_timings
    
    # Unit: milliseconds
    #                               expr       min        lq      mean    median        uq      max neval
    #       copy_double_ex(x, value = 9) 523.55708 529.23219 547.22669 536.71177 555.00069 640.5020   100
    #  reference_double_ex(x, value = 9)  79.78624  80.70757  88.67695  82.44711  85.73199 308.4219   100
    

    Note: The referenced object is ~ 6.509771 times faster per iteration than the copied paradigm as we do not have to reallocate and fill that memory.

    When to use which?

    What do you need to do?

    Are you just trying to quickly speed up an algorithm that relies on a loop but lacks the need for rigorous linear algebra manipulation?

    If so, just using Rcpp should suffice.

    Are you trying to perform linear algebra manipulations? Or are you hoping to use this code across multiple libraries or computational platforms (e.g. MATLAB, Python, R, ...)?

    If so, you should be writing the crux of the algorithm in armadillo and setting up the appropriate hooks to export the functions into R with Rcpp.

    Is there a performance/memory advantage of using one over the other?

    Yes, as indicated previously, there is definitely a performance / memory advantage. Not only that, but by using RcppArmadillo you are effectively adding an additional library ontop of Rcpp and, thus, increasing the overall installation footprint, compile time, and system requirements (see the woes of macOS builds). Figure out what is required by your project and then opt for that structure.

    Are the only difference the member functions?

    Not only member functions, but:

    • estimation routines in terms of matrix decomposition
    • computing statistical quantity values
    • object generation
    • sparse representation (avoid manipulating an S4 object)

    These are fundamental differences between Rcpp and armadillo. One is meant to facilitate transference of R objects into C++ whereas the other is meant for more rigorous linear algebra computations. This should be largely evident as Rcpp does not implement any matrix multiplication logic whereas armadillo uses the system's Basic Linear Algebra Subprograms (BLAS) to perform the computation.

    And, as a bonus question: should I even consider arma::colvec or arma::rowvec?

    Depends on how you want the result to be returned. Do you want to have an: 1 x N (row vector) or N x 1 (column vector)? RcppArmadillo by default returns these structures as matrix objects with their appropriate dimensions and not a traditional 1D R vector.

    As an example:

    #include <RcppArmadillo.h>
    
    // [[Rcpp::depends(RcppArmadillo)]]
    
    // [[Rcpp::export]]
    arma::vec col_example(int n) {
        arma::vec x = arma::randu<arma::vec>(n);
        return x;
    }
    
    
    // [[Rcpp::export]]
    arma::rowvec row_example(int n) {
        arma::rowvec x = arma::randu<arma::rowvec>(n);
        return x;
    }
    

    Test:

    set.seed(1)
    col_example(4)
    #           [,1]
    # [1,] 0.2655087
    # [2,] 0.3721239
    # [3,] 0.5728534
    # [4,] 0.9082078
    
    set.seed(1)
    row_example(4)
    #           [,1]      [,2]      [,3]      [,4]
    # [1,] 0.2655087 0.3721239 0.5728534 0.9082078
    

    这篇关于在Rcpp中确定NumericVector和arma :: vec的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆