Rcpp/RcppArmadillo:根据位置从向量中删除非连续元素 [英] Rcpp/RcppArmadillo: removing non-contiguous elements from a vector based on positions

查看:86
本文介绍了Rcpp/RcppArmadillo:根据位置从向量中删除非连续元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个向量 [2,4,6,8,10],我需要从这个向量中删除第二个 的第四个元素.所需的结果向量应该是 [2,6,10].这在 R 中很容易实现:

Let's say I have a vector [2,4,6,8,10], and I need to remove the 2nd and the 4th elements from this vector. The desired resulting vector should be [2,6,10]. This is very easy to implement in R:

v1 <- c(2,4,6,8,10)
v1[-c(2,4)]

但是我如何在 Rcpp/RcppArmadillo 中实现它?我可以通过使用 .erase() 函数找出连续的情况(即删除第二个 的第四个元素),但非连续的情况似乎没有对我来说很明显,因为 .erase 似乎不接受 uvec 类型的向量.速度可能是一个考虑因素,因为 v1 在我的应用程序中可能非常大.

But how do I implement this in Rcpp/RcppArmadillo? I can figure out the contiguous case (i.e. removing the 2nd through the 4th elements) by using the .erase() function, but the non-contiguous case doesn't seem so obvious to me since .erase does not seem to accept uvec type of vectors. Speed could be a consideration because v1 could be quite large in my application.

Rcpp 或犰狳实现对我来说都很好,因为我同时使用两者.

Either Rcpp or Armadillo implementation is fine by me as I am using both.

推荐答案

这是一种可能的方法:

#include <Rcpp.h>

Rcpp::LogicalVector logical_index(Rcpp::IntegerVector idx, R_xlen_t n) {
  bool invert = false; 
  Rcpp::LogicalVector result(n, false);

  for (R_xlen_t i = 0; i < idx.size(); i++) {
    if (!invert && idx[i] < 0) invert = true;
    result[std::abs(idx[i])] = true;
  }

  if (!invert) return result;
  return !result;
}


// [[Rcpp::export]]
Rcpp::NumericVector 
Subset(Rcpp::NumericVector x, Rcpp::IntegerVector idx) {
  return x[logical_index(idx, x.size())];
}

<小时>

x <- seq(2, 10, 2)

x[c(2, 4)]
#[1] 4 8
Subset(x, c(1, 3))
#[1] 4 8

x[-c(2, 4)]
#[1]  2  6 10
Subset(x, -c(1, 3))
#[1]  2  6 10 

<小时>

请注意,Rcpp 函数的索引是从 0 开始的,因为它们是在 C++ 中处理的.


Note that the indices for the Rcpp function are 0-based, as they are processed in C++.

我将子集逻辑抽象为它自己的函数 logical_index,它将 IntegerVector 转换为 LogicalVector 以便能够决定"是删除还是保留指定的元素(例如通过反转结果).我想这也可以通过基于整数的子集来完成,但无论如何都无关紧要.

I abstracted the subsetting logic into its own function, logical_index, which converts an IntegerVector to a LogicalVector in order to be able to "decide" whether to drop or keep the specified elements (e.g. by inverting the result). I suppose this could be done with integer-based subsetting as well, but it should not matter either way.

就像R中的向量子集一样,一个所有负索引的向量意味着删除相应的元素;而所有正索引的向量表示要保留的元素.我没有检查混合情况,这可能会抛出异常,就像 R 会做的那样.

Like vector subsetting in R, a vector of all negative indices means to drop the corresponding elements; whereas a vector of all positive indices indicates the elements to keep. I did not check for mixed cases, which should probably throw an exception, as R will do.

关于我的最后一条评论,依靠 Rcpp 的原生重载进行普通子集设置可能更明智,并为否定子集设置专用函数(R 的 x[-c(...)] 构造),而不是像上面那样混合功能.已经存在用于创建这样一个函数的糖表达式,例如

Regarding my last comment, it would probably be more sensible to rely on Rcpp's native overloads for ordinary subsetting, and have a dedicated function for negated subsetting (R's x[-c(...)] construct), rather than mixing functionality as above. There are pre-existing sugar expressions for creating such a function, e.g.

#include <Rcpp.h>

template <int RTYPE>
inline Rcpp::Vector<RTYPE> 
anti_subset(const Rcpp::Vector<RTYPE>& x, Rcpp::IntegerVector idx) {
  Rcpp::IntegerVector xi = Rcpp::seq(0, x.size() - 1);
  return x[Rcpp::setdiff(xi, idx)];
}

// [[Rcpp::export]]
Rcpp::NumericVector 
AntiSubset(Rcpp::NumericVector x, Rcpp::IntegerVector idx) {
  return anti_subset(x, idx);
}

/*** R

x <- seq(2, 10, 2)

x[-c(2, 4)]
#[1]  2  6 10

AntiSubset(x, c(1, 3))
#[1]  2  6 10

*/ 

这篇关于Rcpp/RcppArmadillo:根据位置从向量中删除非连续元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆