Rcpp/RcppArmadillo:根据位置从向量中删除非连续元素 [英] Rcpp/RcppArmadillo: removing non-contiguous elements from a vector based on positions
问题描述
假设我有一个向量 [2,4,6,8,10],我需要从这个向量中删除第二个 和 的第四个元素.所需的结果向量应该是 [2,6,10].这在 R 中很容易实现:
Let's say I have a vector [2,4,6,8,10], and I need to remove the 2nd and the 4th elements from this vector. The desired resulting vector should be [2,6,10]. This is very easy to implement in R:
v1 <- c(2,4,6,8,10)
v1[-c(2,4)]
但是我如何在 Rcpp/RcppArmadillo 中实现它?我可以通过使用 .erase()
函数找出连续的情况(即删除第二个 到 的第四个元素),但非连续的情况似乎没有对我来说很明显,因为 .erase
似乎不接受 uvec
类型的向量.速度可能是一个考虑因素,因为 v1 在我的应用程序中可能非常大.
But how do I implement this in Rcpp/RcppArmadillo? I can figure out the contiguous case (i.e. removing the 2nd through the 4th elements) by using the .erase()
function, but the non-contiguous case doesn't seem so obvious to me since .erase
does not seem to accept uvec
type of vectors. Speed could be a consideration because v1 could be quite large in my application.
Rcpp 或犰狳实现对我来说都很好,因为我同时使用两者.
Either Rcpp or Armadillo implementation is fine by me as I am using both.
推荐答案
这是一种可能的方法:
#include <Rcpp.h>
Rcpp::LogicalVector logical_index(Rcpp::IntegerVector idx, R_xlen_t n) {
bool invert = false;
Rcpp::LogicalVector result(n, false);
for (R_xlen_t i = 0; i < idx.size(); i++) {
if (!invert && idx[i] < 0) invert = true;
result[std::abs(idx[i])] = true;
}
if (!invert) return result;
return !result;
}
// [[Rcpp::export]]
Rcpp::NumericVector
Subset(Rcpp::NumericVector x, Rcpp::IntegerVector idx) {
return x[logical_index(idx, x.size())];
}
<小时>
x <- seq(2, 10, 2)
x[c(2, 4)]
#[1] 4 8
Subset(x, c(1, 3))
#[1] 4 8
x[-c(2, 4)]
#[1] 2 6 10
Subset(x, -c(1, 3))
#[1] 2 6 10
<小时>
请注意,Rcpp 函数的索引是从 0 开始的,因为它们是在 C++ 中处理的.
Note that the indices for the Rcpp function are 0-based, as they are processed in C++.
我将子集逻辑抽象为它自己的函数 logical_index
,它将 IntegerVector
转换为 LogicalVector
以便能够决定"是删除还是保留指定的元素(例如通过反转结果).我想这也可以通过基于整数的子集来完成,但无论如何都无关紧要.
I abstracted the subsetting logic into its own function, logical_index
, which converts an IntegerVector
to a LogicalVector
in order to be able to "decide" whether to drop or keep the specified elements (e.g. by inverting the result). I suppose this could be done with integer-based subsetting as well, but it should not matter either way.
就像R中的向量子集一样,一个所有负索引的向量意味着删除相应的元素;而所有正索引的向量表示要保留的元素.我没有检查混合情况,这可能会抛出异常,就像 R 会做的那样.
Like vector subsetting in R, a vector of all negative indices means to drop the corresponding elements; whereas a vector of all positive indices indicates the elements to keep. I did not check for mixed cases, which should probably throw an exception, as R will do.
关于我的最后一条评论,依靠 Rcpp 的原生重载进行普通子集设置可能更明智,并为否定子集设置专用函数(R 的 x[-c(...)]
构造),而不是像上面那样混合功能.已经存在用于创建这样一个函数的糖表达式,例如
Regarding my last comment, it would probably be more sensible to rely on Rcpp's native overloads for ordinary subsetting, and have a dedicated function for negated subsetting (R's x[-c(...)]
construct), rather than mixing functionality as above. There are pre-existing sugar expressions for creating such a function, e.g.
#include <Rcpp.h>
template <int RTYPE>
inline Rcpp::Vector<RTYPE>
anti_subset(const Rcpp::Vector<RTYPE>& x, Rcpp::IntegerVector idx) {
Rcpp::IntegerVector xi = Rcpp::seq(0, x.size() - 1);
return x[Rcpp::setdiff(xi, idx)];
}
// [[Rcpp::export]]
Rcpp::NumericVector
AntiSubset(Rcpp::NumericVector x, Rcpp::IntegerVector idx) {
return anti_subset(x, idx);
}
/*** R
x <- seq(2, 10, 2)
x[-c(2, 4)]
#[1] 2 6 10
AntiSubset(x, c(1, 3))
#[1] 2 6 10
*/
这篇关于Rcpp/RcppArmadillo:根据位置从向量中删除非连续元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!