用于选择(并返回)子数据帧的 Rcpp 函数 [英] Rcpp function to select (and to return) a sub-dataframe

查看:31
本文介绍了用于选择(并返回)子数据帧的 Rcpp 函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以编写一个 C++ 函数来获取 R 数据帧作为输入,然后修改数据帧(在我们的例子中采用一个子集)并返回新数据帧(在这个问题中,返回一个子数据帧)?我下面的代码可能会使我的问题更清楚:

Is it possible to write a C++ function that gets an R dataFrame as input, then modifies the dataFrame (in our case taking a subset) and returns the new data frame (in this question, returning a sub-dataframe) ? My code below may make my question more clear:

代码:

# Suppose I have the data frame below created in R:
myDF = data.frame(id = rep(c(1,2), each = 5), alph = letters[1:10], mess = rnorm(10))

# Suppose I want to write a C++ function that gets id as inout and returns 
# a sub-dataframe corresponding to that id (**If it's possible to return 
# DataFrame in C++**)

# Auxiliary function --> helps get a sub vector:
arma::vec myVecSubset(arma::vec vecMain, arma::vec IDVec, int ID){
  arma::uvec AuxVec = find(IDVec == ID);
  arma::vec rslt = arma::vec(AuxVec.size());
  for (int i = 0; i < AuxVec.size(); i++){
    rslt[i] = vecMain[AuxVec[i]];
  }
  return rslt;
}

# Here is my C++ function:
Rcpp::DataFrame myVecSubset(Rcpp::DataFrame myDF, int ID){
  arma::vec id = Rcpp::as<arma::vec>(myDF["id"]);
  arma::vec alph = Rcpp::as<arma::vec>(myDF["alpha"]);
  arma::vec mess = Rcpp::as<arma::vec>(myDF["mess"]);

  // here I take a sub-vector:
  arma::vec id_sub = myVecSubset(id, id, int ID);
  arma::vec alph_sub = myVecSubset(alph, id, int ID);
  arma::vec mess_sub = myVecSubset(mess, id, int ID);

  // here is the CHALLENGE: How to combine these vectors into a new data frame???
  ???
}

总结起来,其实主要有两个问题:1)有没有更好的方法在C++中使用上面的子数据框?(希望我能简单地说 myDF[myDF$id == ID,]!!!)

In summary, there are actually two main question: 1) Is there any better way to take the sub-dataframe above in C++? (wish I could simple say myDF[myDF$id == ID,]!!!)

2) 无论如何我可以将 id_sub、alpha_sub 和 mess_sub 组合成一个 R 数据帧并返回它吗?

2) Is there anyway that I can combine id_sub, alpha_sub, and mess_sub into an R data frame and return it?

非常感谢您的帮助.

推荐答案

你不需要 RcppRcppArmadillo ,你可以使用 R 的 子集 或者dplyr::filter.这可能比您的代码更有效,因为您的代码必须将数据从数据帧深度复制到犰狳向量中,创建新的犰狳向量,然后将它们复制回 R 向量,以便您可以构建数据帧.这会产生大量废物.另一个浪费的来源是你找到三倍相同的东西

You don't need Rcpp and RcppArmadillo for that, you can just use R's subset or perhaps dplyr::filter. This is likely to be more efficient than your code that has to deep copy data from the data frame into armadillo vectors, create new armadillo vectors, and then copy these back into R vectors so that you can build the data frame. This produces lots of waste. Another source of waste is that you find three times the same exact thing

无论如何,要回答您的问题,只需使用 DataFrame::create.

Anyway, to answer your question, just use DataFrame::create.

DataFrame::create( _["id"] = id_sub, _["alpha"] = alph_dub, _["mess"] = mess_sub ) ;

另外,请注意,在您的代码中,alpha 将是一个因素,因此 arma::vec alph = Rcpp::as<arma::vec>(myDF["alpha"]); 不太可能做你想做的事.

Also, note that in your code, alpha will be a factor, so arma::vec alph = Rcpp::as<arma::vec>(myDF["alpha"]); is not likely to do what you want.

这篇关于用于选择(并返回)子数据帧的 Rcpp 函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆