Rcpp 中的排序排列,即 base::order() [英] Ordering Permutation in Rcpp i.e. base::order()

查看:32
本文介绍了Rcpp 中的排序排列,即 base::order()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大量使用 base::order() 命令的代码,而且我真的懒得在 rcpp 中编写代码.由于 Rcpp 只支持排序,不支持排序,所以我花了 2 分钟创建了这个函数:

I have a ton of code using the base::order() command and I am really too lazy to code around that in rcpp. Since Rcpp only supports sort, but not order, I spent 2 minutes creating this function:

// [[Rcpp::export]]
Rcpp::NumericVector order_cpp(Rcpp::NumericVector invec){
  int leng = invec.size();
  NumericVector y = clone(invec);
  for(int i=0; i<leng; ++i){
    y[sum(invec<invec[i])] = i+1;
  }
  return(y);
}

它以某种方式起作用.如果向量包含唯一数字,则得到与 order() 相同的结果.如果它们不是唯一的,结果就会不同,但不会错(真的没有唯一的解决方案).

It somehow works. If the vectors are containing unique numbers, I get the same result as order(). If they are not unique, results are different, but not wrong (no unique solution really).

使用:

c=sample(1:1000,500)
all.equal(order(c),order_cpp(c))
microbenchmark(order(c),order_cpp(c))

Unit: microseconds
         expr      min       lq   median       uq      max neval
     order(c)   33.507   36.223   38.035   41.356   78.785   100
 order_cpp(c) 2372.889 2427.071 2466.312 2501.932 2746.586   100

哎哟!我需要一个高效的算法.好的,所以我 挖掘一个冒泡排序实现并对其进行了调整:

Ouch! I need an efficient algorithm. Ok, so I dug up a bubblesort implementation and adapted it:

 // [[Rcpp::export]]
Rcpp::NumericVector bubble_order_cpp2(Rcpp::NumericVector vec){                                  
       double tmp = 0;
       int n = vec.size();
              Rcpp::NumericVector outvec = clone(vec);
       for (int i = 0; i <n; ++i){
         outvec[i]=static_cast<double>(i)+1.0;
       }

       int no_swaps;
       int passes;
       passes = 0;
       while(true) {
           no_swaps = 0;
           for (int i = 0; i < n - 1 - passes; ++i) {
               if(vec[i] > vec[i+1]) {
                   no_swaps++;
                   tmp = vec[i];
                   vec[i] = vec[i+1];
                   vec[i+1] = tmp;
                   tmp = outvec[i]; 
                   outvec[i] = outvec[i+1];  
                   outvec[i+1] = tmp; 
               };
           };
           if(no_swaps == 0) break;
           passes++;
       };       
       return(outvec);
}

嗯,它更好 - 但不是很好:

Well, it's better - but not great:

microbenchmark(order(c),order_cpp(c),bubble_order_cpp2(c),sort(c),c[order(c)])
Unit: microseconds
                 expr      min       lq    median        uq      max neval
             order(c)   33.809   38.034   40.1475   43.3170   72.144   100
         order_cpp(c) 2339.080 2435.675 2478.5385 2526.8350 3535.637   100
 bubble_order_cpp2(c)  219.752  231.977  234.5430  241.1840  322.383   100
              sort(c)   59.467   64.749   68.2205   75.4645  148.815   100
          c[order(c)]   38.336   41.204   44.3735   48.1460   93.878   100

另一个发现:订购比排序更快.

Another finding: It's faster to order than to sort.

好吧,那么对于较短的向量:

Well, then for shorter vectors:

c=sample(1:100)
 microbenchmark(order(c),order_cpp(c),bubble_order_cpp2(c),sort(c),c[order(c)])
Unit: microseconds
                 expr    min       lq   median       uq     max neval
             order(c) 10.566  11.4710  12.8300  14.1880  63.089   100
         order_cpp(c) 95.689 100.8200 102.7825 107.3105 198.018   100
 bubble_order_cpp2(c)  9.962  11.1700  12.0750  13.2830  64.598   100
              sort(c) 39.242  41.5065  42.5620  46.3355 155.758   100
          c[order(c)] 11.773  12.6790  13.5840  15.9990  82.710   100

哦,好吧,我忽略了一个 RcppArmadillo 函数:

Oh well, I have overlooked an RcppArmadillo function:

// [[Rcpp::export]]
Rcpp::NumericVector ordera(arma::vec x) {
  return(Rcpp::as<Rcpp::NumericVector>(Rcpp::wrap(arma::sort_index( x )+1)) );
}

microbenchmark(order(c),order_(c),ordera(c))
Unit: microseconds
      expr   min     lq median     uq    max neval
  order(c) 9.660 11.169 11.773 12.377 46.185   100
 order_(c) 4.529  5.133  5.736  6.038 34.413   100
 ordera(c) 4.227  4.830  5.434  6.038 60.976   100

推荐答案

这是一个利用 Rcpp 糖实现 order 功能的简单版本.我们检查了重复项,以确保事情按预期"工作.(当存在 NA 时,Rcpp 的 sort 方法也有一个错误,因此可能也需要检查——这将在下一个版本中修复).

Here's a simple version leveraging Rcpp sugar to implement an order function. We put in a check for duplicates so that we guarantee that things work 'as expected'. (There is also a bug with Rcpp's sort method when there are NAs, so that may want to be checked as well -- this will be fixed by the next release).

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
IntegerVector order_(NumericVector x) {
  if (is_true(any(duplicated(x)))) {
    Rf_warning("There are duplicates in 'x'; order not guaranteed to match that of R's base::order");
  }
  NumericVector sorted = clone(x).sort();
  return match(sorted, x);
}

/*** R
library(microbenchmark)
x <- runif(1E5)
identical( order(x), order_(x) )
microbenchmark(
  order(x),
  order_(x)
)
*/

给我

> Rcpp::sourceCpp('~/test-order.cpp')

> set.seed(456)

> library(microbenchmark)

> x <- runif(1E5)

> identical( order(x), order_(x) )
[1] TRUE

> microbenchmark(
+   order(x),
+   order_(x)
+ )
Unit: milliseconds
      expr      min       lq   median       uq      max neval
  order(x) 15.48007 15.69709 15.86823 16.21142 17.22293   100
 order_(x) 10.81169 11.07167 11.40678 11.87135 48.66372   100
> 

当然,如果您对与 R 不匹配的输出感到满意,您可以删除重复的检查——x[order_(x)] 仍然会被正确排序;更具体地说,all(x[order(x)] == x[order_(x)]) 应该返回 TRUE.

Of course, if you're comfortable with the output not matching R, you can remove the duplicated check -- x[order_(x)] will still be properly sorted; more specifically, all(x[order(x)] == x[order_(x)]) should return TRUE.

这篇关于Rcpp 中的排序排列,即 base::order()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆