将Rcpp对象分配到Rcpp列表中会产生最后一个元素的重复项 [英] Assigning Rcpp objects into an Rcpp List yields duplicates of the last element

查看:52
本文介绍了将Rcpp对象分配到Rcpp列表中会产生最后一个元素的重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试采用 Rcpp :: CharacterMatrix 并将每一行转换为 Rcpp :: List 中的其自己的元素.

I am trying to take a Rcpp::CharacterMatrix and convert each row to its own element within an Rcpp::List.

但是,我编写的函数具有奇怪的行为,其中列表的每个条目都对应于矩阵的最后一行.为什么会这样呢?这是一些与指针相关的概念吗?请解释.

However, the function that I have written to do so has an odd behavior where every entry of the list corresponds to the last row of matrix. Why is it so? Is this some pointer related concept? Please explain.

功能

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
List char_expand_list(CharacterMatrix A) {
  CharacterVector B(A.ncol());

  List output;

  for(int i=0;i<A.nrow();i++) {
    for(int j=0;j<A.ncol();j++) {
      B[j] = A(i,j);
    }

    output.push_back(B);
  }

  return output;
}

测试矩阵:

这是传递给上述函数的矩阵 A .

This is the matrix A that is passed to the above function.

mat = structure(c("a", "b", "c", "a", "b", "c", "a", "b", "c"), .Dim = c(3L, 3L))
mat
#     [,1] [,2] [,3]
# [1,] "a"  "a"  "a" 
# [2,] "b"  "b"  "b" 
# [3,] "c"  "c"  "c"

输出:

上面的函数应将此矩阵作为输入并返回矩阵的行列表,如下所示:

Above function should take this matrix as input and return a list of rows of matrix like so:

char_expand_list(mat)
# [[1]]
# [1] "a" "a" "a"
#
# [[2]]
# [1] "b" "b" "b"
#
# [[3]]
# [1] "c" "c" "c"

但是我得到了一些不同的东西:

But instead I am getting something different:

char_expand_list(mat)
# [[1]]
# [1] "c" "c" "c"
#
# [[2]]
# [1] "c" "c" "c"
#
# [[3]]
# [1] "c" "c" "c"

可以看出,输出具有最后一个元素,例如对第一和​​第二列表元素重复的"c"矩阵行.为什么会这样?

As can be seen, the output has the last element, e.g. the matrix row of "c", repeated for the first and second list elements. Why is this happening?

推荐答案

这里发生的事情很大程度上是Rcpp对象如何工作的结果.特别是, CharacterVector 充当指向内存位置的指针.通过在 for 循环之外定义此内存位置,结果是一个全局"指针.也就是说,在循环中发生对 B 的更新时随后,这会更新已方便地存储在 Rcpp :: List 中的所有 B 变体.因此,整个"c" 的重复行列表.

What's happening here is largely the result of how Rcpp objects work. In particular, CharacterVector acts as a pointer to a memory location. By defining this memory location outside the for loop, the result is a "global" pointer. That is, when an update to B occurs in the loop this subsequently updates all variants of B that have been conveniently stored in the Rcpp::List. Hence, the repeated lines of "c" throughout the list.

话虽如此,在任何 Rcpp 数据类型上使用 .push_back()是一个非常非常非常糟糕的主意.因为您最终将在不断扩展的对象之间来回复制.当Rcpp数据类型隐藏控制R对象的基础 SEXP 时,将发生复制,必须重新创建该对象.因此,您应该尝试以下方法之一:

With this being said, it is a very, very, very bad idea to use .push_back() on any Rcpp data types as you will end up copying to and fro the ever expanding object. Copying will occur as Rcpp data types hide the underlying SEXP that controls the R object, which must be recreated. As a result, you should try one of the following approaches:

  • 重新排列其中创建 Rcpp :: CharacterVector 的位置,使其位于第一个 for 循环内,并预先分配 Rcpp :: List 空间.
  • li>
  • 切换为仅使用C ++标准库对象,最后将其转换为适当的类型.
    • std :: list std :: vector< T> 类型 T (即 std :: string )
    • Rcpp :: wrap(x)返回正确的对象或将函数返回类型从 Rcpp :: List 修改为 std :: list< std:: vector< T>> .
    • Rearrange where the Rcpp::CharacterVector is created to be inside the first for loop and preallocate Rcpp::List space.
    • Switch to using only C++ standard library objects and convert at the end to the appropriate type.
      • std::list with std::vector<T> type T (i.e. std::string)
      • Rcpp::wrap(x) to return the correct object or modify the function return type from Rcpp::List to std::list<std::vector<T> >.

      在这里,我们通过将 B 的声明移到首先循环,预分配列表空间,然后正常访问输出列表.

      Here we rearrange the function by moving the declaration for B into the first loop, preallocate the list space, and access the output list normally.

      #include <Rcpp.h>
      using namespace Rcpp;
      
      // [[Rcpp::export]]
      Rcpp::List char_expand_list_rearrange(Rcpp::CharacterMatrix A) {
        Rcpp::List output(A.nrow());
      
        for(int i = 0; i < A.nrow(); i++) {
          Rcpp::CharacterVector B(A.ncol());
      
          for(int j = 0; j < A.ncol(); j++) {
            B[j] = A(i, j);
          }
      
          output[i] = B;
        }
      
        return output;
      }
      

      选项2

      此处我们删除了 Rcpp :: CharacterVector ,而推荐使用 std :: vector< std :: string> ,并替换了 Rcpp :: List 用于 std :: list< std :: vector< std :: string>> .最后,我们通过 Rcpp :: wrap()将标准对象转换为 Rcpp :: List .

      Option 2

      Here we removed Rcpp::CharacterVector in favor of std::vector<std::string> and substituted Rcpp::List for std::list<std::vector<std::string> >. At the end, we convert the standard object to an Rcpp::List via Rcpp::wrap().

      #include <Rcpp.h>
      using namespace Rcpp;
      
      // [[Rcpp::export]]
      Rcpp::List char_expand_std_to_list(Rcpp::CharacterMatrix A) {
        std::vector<std::string> B(A.ncol());
      
        std::list<std::vector<std::string> > o;
      
        for(int i = 0 ;i < A.nrow(); i++) {
          for(int j = 0; j < A.ncol(); j++) {
            B[j] = A(i, j);
          }
      
          o.push_back(B);
        }
      
        return Rcpp::wrap(o);
      }
      

      给予:

      mat = structure(c("a", "b", "c", "a", "b", "c", "a", "b", "c"), .Dim = c(3L, 3L))
      char_expand_std_to_list(mat)
      # [[1]]
      # [1] "a" "a" "a"
      #
      # [[2]]
      # [1] "b" "b" "b"
      #
      # [[3]]
      # [1] "c" "c" "c"
      

      选项3

      或者,您可以保留 Rcpp :: List ,但是只声明大小它期望提前,但仍使用 std :: vector< T> 元素.

      Option 3

      Alternatively, you could aim to keep the Rcpp::List, but just declare the size it is expecting ahead of time and still use a std::vector<T> element.

      #include <Rcpp.h>
      using namespace Rcpp;
      
      // [[Rcpp::export]]
      Rcpp::List char_expand_list_vec(Rcpp::CharacterMatrix A) {
        std::vector<std::string> B(A.ncol());
      
        Rcpp::List o(A.nrow());
      
        for(int i = 0; i < A.nrow(); i++) {
          for(int j = 0; j < A.ncol(); j++) {
            B[j] = A(i, j);
          }
      
          o[i] = B;
        }
      
        return o;
      }
      

      选项4

      最后,在为列表预定义空间的情况下,有一个明确的副本每次迭代中获取数据.

      Option 4

      Lastly, with space predefined for a list, there is an explicit clone of the data at each iteration.

      #include <Rcpp.h>
      using namespace Rcpp;
      
      // [[Rcpp::export]]
      Rcpp::List char_expand_list_clone(Rcpp::CharacterMatrix A) {
        Rcpp::CharacterVector B(A.ncol());
        Rcpp::List output(A.nrow());
      
        for(int i = 0; i < A.nrow(); i++) {
      
          for(int j = 0; j < A.ncol(); j++) {
            B[j] = A(i, j);
          }
      
          output[i] = clone(B);
        }
      
        return output;
      }
      

      基准

      基准测试结果表明,选项1 具有重新安排和预分配空间表现最好.亚军是选项4 涉及克隆每个载体,然后再将其保存到 Rcpp :: List .

      Benchmark

      The benchmark results show that Option 1 with a rearrangement and preallocation of space performs the best. The runner-up second is Option 4, which involves cloning each vector before saving it into the Rcpp::List.

      library("microbenchmark")
      library("ggplot2")
      
      mat = structure(c("a", "b", "c", "a", "b", "c", "a", "b", "c"), .Dim = c(3L, 3L))
      
      micro_mat_to_list = 
        microbenchmark(char_expand_list_rearrange(mat),
                       char_expand_std_to_list(mat),
                       char_expand_list_vec(mat),
                       char_expand_list_clone(mat))
      micro_mat_to_list
      # Unit: microseconds
      #                             expr   min     lq    mean median     uq    max neval
      #  char_expand_list_rearrange(mat) 1.501 1.9255 3.22054 2.1965 4.8445  6.797   100
      #     char_expand_std_to_list(mat) 2.869 3.2035 4.90108 3.7740 6.4415 27.627   100
      #        char_expand_list_vec(mat) 1.948 2.2335 3.83939 2.7130 5.2585 24.814   100
      #      char_expand_list_clone(mat) 1.562 1.9225 3.60184 2.2370 4.8435 33.965   100
      

      这篇关于将Rcpp对象分配到Rcpp列表中会产生最后一个元素的重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆