将Rcpp对象分配到Rcpp列表中会产生最后一个元素的重复项 [英] Assigning Rcpp objects into an Rcpp List yields duplicates of the last element
问题描述
我正在尝试采用 Rcpp :: CharacterMatrix
并将每一行转换为 Rcpp :: List
中的其自己的元素.
I am trying to take a Rcpp::CharacterMatrix
and convert each row to its own element within an Rcpp::List
.
但是,我编写的函数具有奇怪的行为,其中列表的每个条目都对应于矩阵的最后一行.为什么会这样呢?这是一些与指针相关的概念吗?请解释.
However, the function that I have written to do so has an odd behavior where every entry of the list corresponds to the last row of matrix. Why is it so? Is this some pointer related concept? Please explain.
功能
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
List char_expand_list(CharacterMatrix A) {
CharacterVector B(A.ncol());
List output;
for(int i=0;i<A.nrow();i++) {
for(int j=0;j<A.ncol();j++) {
B[j] = A(i,j);
}
output.push_back(B);
}
return output;
}
测试矩阵:
这是传递给上述函数的矩阵 A
.
This is the matrix A
that is passed to the above function.
mat = structure(c("a", "b", "c", "a", "b", "c", "a", "b", "c"), .Dim = c(3L, 3L))
mat
# [,1] [,2] [,3]
# [1,] "a" "a" "a"
# [2,] "b" "b" "b"
# [3,] "c" "c" "c"
输出:
上面的函数应将此矩阵作为输入并返回矩阵的行列表,如下所示:
Above function should take this matrix as input and return a list of rows of matrix like so:
char_expand_list(mat)
# [[1]]
# [1] "a" "a" "a"
#
# [[2]]
# [1] "b" "b" "b"
#
# [[3]]
# [1] "c" "c" "c"
但是我得到了一些不同的东西:
But instead I am getting something different:
char_expand_list(mat)
# [[1]]
# [1] "c" "c" "c"
#
# [[2]]
# [1] "c" "c" "c"
#
# [[3]]
# [1] "c" "c" "c"
可以看出,输出具有最后一个元素,例如对第一和第二列表元素重复的"c"矩阵行.为什么会这样?
As can be seen, the output has the last element, e.g. the matrix row of "c", repeated for the first and second list elements. Why is this happening?
推荐答案
这里发生的事情很大程度上是Rcpp对象如何工作的结果.特别是, CharacterVector
充当指向内存位置的指针.通过在 for
循环之外定义此内存位置,结果是一个全局"指针.也就是说,在循环中发生对 B
的更新时随后,这会更新已方便地存储在 Rcpp :: List
中的所有 B
变体.因此,整个"c"
的重复行列表.
What's happening here is largely the result of how Rcpp objects work.
In particular, CharacterVector
acts as a pointer to a memory location.
By defining this memory location outside the for
loop, the result is
a "global" pointer. That is, when an update to B
occurs in the loop
this subsequently updates all variants of B
that have been conveniently stored in the Rcpp::List
. Hence, the repeated lines of "c"
throughout
the list.
话虽如此,在任何 Rcpp
数据类型上使用 .push_back()
是一个非常非常非常糟糕的主意.因为您最终将在不断扩展的对象之间来回复制.当Rcpp数据类型隐藏控制R对象的基础 SEXP
时,将发生复制,必须重新创建该对象.因此,您应该尝试以下方法之一:
With this being said, it is a very, very, very bad idea to use .push_back()
on any Rcpp
data types as you will end up copying to and fro the ever expanding object. Copying will occur as Rcpp data types hide the underlying SEXP
that controls the R object, which must be recreated. As a result, you should try one of the following approaches:
- 重新排列其中创建
Rcpp :: CharacterVector
的位置,使其位于第一个for
循环内,并预先分配Rcpp :: List
空间. li> - 切换为仅使用C ++标准库对象,最后将其转换为适当的类型.
-
std :: list
与std :: vector< T>
类型T
(即std :: string
) -
Rcpp :: wrap(x)
返回正确的对象或将函数返回类型从Rcpp :: List
修改为std :: list< std:: vector< T>>
.
- Rearrange where the
Rcpp::CharacterVector
is created to be inside the firstfor
loop and preallocateRcpp::List
space. - Switch to using only C++ standard library objects and convert at the end to the appropriate type.
std::list
withstd::vector<T>
typeT
(i.e.std::string
)Rcpp::wrap(x)
to return the correct object or modify the function return type fromRcpp::List
tostd::list<std::vector<T> >
.
在这里,我们通过将
B
的声明移到首先循环,预分配列表空间,然后正常访问输出列表.Here we rearrange the function by moving the declaration for
B
into the first loop, preallocate the list space, and access the output list normally.#include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] Rcpp::List char_expand_list_rearrange(Rcpp::CharacterMatrix A) { Rcpp::List output(A.nrow()); for(int i = 0; i < A.nrow(); i++) { Rcpp::CharacterVector B(A.ncol()); for(int j = 0; j < A.ncol(); j++) { B[j] = A(i, j); } output[i] = B; } return output; }
选项2
此处我们删除了
Rcpp :: CharacterVector
,而推荐使用std :: vector< std :: string>
,并替换了Rcpp :: List
用于std :: list< std :: vector< std :: string>>
.最后,我们通过Rcpp :: wrap()
将标准对象转换为Rcpp :: List
.Option 2
Here we removed
Rcpp::CharacterVector
in favor ofstd::vector<std::string>
and substitutedRcpp::List
forstd::list<std::vector<std::string> >
. At the end, we convert the standard object to anRcpp::List
viaRcpp::wrap()
.#include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] Rcpp::List char_expand_std_to_list(Rcpp::CharacterMatrix A) { std::vector<std::string> B(A.ncol()); std::list<std::vector<std::string> > o; for(int i = 0 ;i < A.nrow(); i++) { for(int j = 0; j < A.ncol(); j++) { B[j] = A(i, j); } o.push_back(B); } return Rcpp::wrap(o); }
给予:
mat = structure(c("a", "b", "c", "a", "b", "c", "a", "b", "c"), .Dim = c(3L, 3L)) char_expand_std_to_list(mat) # [[1]] # [1] "a" "a" "a" # # [[2]] # [1] "b" "b" "b" # # [[3]] # [1] "c" "c" "c"
选项3
或者,您可以保留
Rcpp :: List
,但是只声明大小它期望提前,但仍使用std :: vector< T>
元素.Option 3
Alternatively, you could aim to keep the
Rcpp::List
, but just declare the size it is expecting ahead of time and still use astd::vector<T>
element.#include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] Rcpp::List char_expand_list_vec(Rcpp::CharacterMatrix A) { std::vector<std::string> B(A.ncol()); Rcpp::List o(A.nrow()); for(int i = 0; i < A.nrow(); i++) { for(int j = 0; j < A.ncol(); j++) { B[j] = A(i, j); } o[i] = B; } return o; }
选项4
最后,在为列表预定义空间的情况下,有一个明确的副本每次迭代中获取数据.
Option 4
Lastly, with space predefined for a list, there is an explicit clone of the data at each iteration.
#include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] Rcpp::List char_expand_list_clone(Rcpp::CharacterMatrix A) { Rcpp::CharacterVector B(A.ncol()); Rcpp::List output(A.nrow()); for(int i = 0; i < A.nrow(); i++) { for(int j = 0; j < A.ncol(); j++) { B[j] = A(i, j); } output[i] = clone(B); } return output; }
基准
基准测试结果表明,选项1 具有重新安排和预分配空间表现最好.亚军是选项4 涉及克隆每个载体,然后再将其保存到
Rcpp :: List
.Benchmark
The benchmark results show that Option 1 with a rearrangement and preallocation of space performs the best. The runner-up second is Option 4, which involves cloning each vector before saving it into the
Rcpp::List
.library("microbenchmark") library("ggplot2") mat = structure(c("a", "b", "c", "a", "b", "c", "a", "b", "c"), .Dim = c(3L, 3L)) micro_mat_to_list = microbenchmark(char_expand_list_rearrange(mat), char_expand_std_to_list(mat), char_expand_list_vec(mat), char_expand_list_clone(mat)) micro_mat_to_list # Unit: microseconds # expr min lq mean median uq max neval # char_expand_list_rearrange(mat) 1.501 1.9255 3.22054 2.1965 4.8445 6.797 100 # char_expand_std_to_list(mat) 2.869 3.2035 4.90108 3.7740 6.4415 27.627 100 # char_expand_list_vec(mat) 1.948 2.2335 3.83939 2.7130 5.2585 24.814 100 # char_expand_list_clone(mat) 1.562 1.9225 3.60184 2.2370 4.8435 33.965 100
这篇关于将Rcpp对象分配到Rcpp列表中会产生最后一个元素的重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
-