Rcpp 子集 DataFrame 的行 [英] Rcpp subsetting rows of DataFrame

查看:24
本文介绍了Rcpp 子集 DataFrame 的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望使用 Rcpp 包创建 iris 数据集的以下子集:

I wished to create a following subset of the iris dataset using the Rcpp package:

head(subset(iris, Species == "versicolor"))

  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
51          7.0         3.2          4.7         1.4 versicolor
52          6.4         3.2          4.5         1.5 versicolor
53          6.9         3.1          4.9         1.5 versicolor
54          5.5         2.3          4.0         1.3 versicolor
55          6.5         2.8          4.6         1.5 versicolor
56          5.7         2.8          4.5         1.3 versicolor

我知道如何对 Rcpp::DataFrame 的列进行子集化 - 有一个重载操作符 [,它在 R 中工作:x["var"].但是,我找不到任何方法可以让我对列数不固定的 DataFrame 的行进行子集化.

I know how to subset columns of Rcpp::DataFrame - there is an overloaded operator [ which works as in R: x["var"]. However, I cannot find any way that would allow me to subset rows of a DataFrame with a not fixed number of columns.

我想编写一个函数 subset_rows_rcpp_iris,它接受 Rcpp::DataFrame(始终是 iris)和一个 CharacterVector level_of_species 作为输入.它将返回 DataFrame 对象.

I would like to write a function subset_rows_rcpp_iris which takes Rcpp::DataFrame (which will always be iris) and a CharacterVector level_of_species as inputs. It will return DataFrame object.

DataFrame subset_rows_rcpp_iris(DataFrame x, CharacterVector level_of_species) {
    ...
}

首先,我想找到满足逻辑查询的行的索引.我的问题是,如果我在 test 函数中访问 Species 向量,请将其另存为 CharacterVector,然后将其与 level_of_speciessetosa 的情况下我总是只得到一个 TRUE 值,在其他情况下我得到 FALSE 值.

First, I want to find indices of rows that satisfy logical query. My problem is that if I access the Species vector in test function, save it as a CharacterVector and then compare it with level_of_species I get always only one TRUE value in case of setosa and FALSE values in other cases.

cppFunction('
    LogicalVector test(DataFrame x, CharacterVector level_of_species) {
            CharacterVector sub = x["Species"];
            LogicalVector ind = sub == level_of_species;
            return(ind);
            }
')
head(test(iris, "setosa"))

[1]  TRUE FALSE FALSE FALSE FALSE FALSE

如果这行得通,我可以重写 test 函数并使用具有 TRUE/FALSE 值的向量分别对数据框的每一列进行子集化,然后将它们再次与 Rcpp 组合::DataFrame::create.

If this worked, I could rewrite test function and use the vector with TRUE/FALSE values to subset each of the column of the data frame separately and then combine them again with Rcpp::DataFrame::create.

推荐答案

cppFunction('LogicalVector test(DataFrame x, StringVector level_of_species) {
  using namespace std;  
  StringVector sub = x["Species"];
  std::string level = Rcpp::as<std::string>(level_of_species[0]);
  Rcpp::LogicalVector ind(sub.size());
  for (int i = 0; i < sub.size(); i++){
      ind[i] = (sub[i] == level);
  }

  return(ind);
}')

xx=test(iris, "setosa")
> table(xx)
 xx
 FALSE  TRUE 
   100    50 

<小时>

子集化完成!!!(我自己从这个问题中学到了很多……谢谢!)


Subsetting done!!! (i myself learnt a lot from this question..thanks!)

cppFunction('Rcpp::DataFrame test(DataFrame x, StringVector level_of_species) {
  using namespace std;  
  StringVector sub = x["Species"];
  std::string level = Rcpp::as<std::string>(level_of_species[0]);
  Rcpp::LogicalVector ind(sub.size());
  for (int i = 0; i < sub.size(); i++){
    ind[i] = (sub[i] == level);
  }

 // extracting each column into a vector
 Rcpp::NumericVector   SepalLength = x["Sepal.Length"];
 Rcpp::NumericVector   SepalWidth = x["Sepal.Width"];
 Rcpp::NumericVector PetalLength = x["Petal.Length"];
 Rcpp::NumericVector   PetalWidth = x["Petal.Width"];


 return Rcpp::DataFrame::create(Rcpp::Named("Sepal.Length")  = SepalLength[ind],
                                Rcpp::Named("Sepal.Width")  = SepalWidth[ind],
                                Rcpp::Named("Petal.Length")  = PetalLength[ind],
                                Rcpp::Named("Petal.Width")  = PetalWidth[ind]
);}')

yy=test(iris, "setosa")
> str(yy)
 'data.frame':  50 obs. of  4 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...

这篇关于Rcpp 子集 DataFrame 的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆