Rcpp 中的排序排列,即 base::order() [英] Ordering Permutation in Rcpp i.e. base::order()
问题描述
我有大量使用 base::order() 命令的代码,而且我真的懒得在 rcpp 中编写代码.由于 Rcpp 只支持排序,不支持排序,所以我花了 2 分钟创建了这个函数:
I have a ton of code using the base::order() command and I am really too lazy to code around that in rcpp. Since Rcpp only supports sort, but not order, I spent 2 minutes creating this function:
// [[Rcpp::export]]
Rcpp::NumericVector order_cpp(Rcpp::NumericVector invec){
int leng = invec.size();
NumericVector y = clone(invec);
for(int i=0; i<leng; ++i){
y[sum(invec<invec[i])] = i+1;
}
return(y);
}
它以某种方式起作用.如果向量包含唯一数字,则得到与 order() 相同的结果.如果它们不是唯一的,结果就会不同,但不会错(真的没有唯一的解决方案).
It somehow works. If the vectors are containing unique numbers, I get the same result as order(). If they are not unique, results are different, but not wrong (no unique solution really).
使用:
c=sample(1:1000,500)
all.equal(order(c),order_cpp(c))
microbenchmark(order(c),order_cpp(c))
Unit: microseconds
expr min lq median uq max neval
order(c) 33.507 36.223 38.035 41.356 78.785 100
order_cpp(c) 2372.889 2427.071 2466.312 2501.932 2746.586 100
哎哟!我需要一个高效的算法.好的,所以我 挖掘一个冒泡排序实现并对其进行了调整:
Ouch! I need an efficient algorithm. Ok, so I dug up a bubblesort implementation and adapted it:
// [[Rcpp::export]]
Rcpp::NumericVector bubble_order_cpp2(Rcpp::NumericVector vec){
double tmp = 0;
int n = vec.size();
Rcpp::NumericVector outvec = clone(vec);
for (int i = 0; i <n; ++i){
outvec[i]=static_cast<double>(i)+1.0;
}
int no_swaps;
int passes;
passes = 0;
while(true) {
no_swaps = 0;
for (int i = 0; i < n - 1 - passes; ++i) {
if(vec[i] > vec[i+1]) {
no_swaps++;
tmp = vec[i];
vec[i] = vec[i+1];
vec[i+1] = tmp;
tmp = outvec[i];
outvec[i] = outvec[i+1];
outvec[i+1] = tmp;
};
};
if(no_swaps == 0) break;
passes++;
};
return(outvec);
}
嗯,它更好 - 但不是很好:
Well, it's better - but not great:
microbenchmark(order(c),order_cpp(c),bubble_order_cpp2(c),sort(c),c[order(c)])
Unit: microseconds
expr min lq median uq max neval
order(c) 33.809 38.034 40.1475 43.3170 72.144 100
order_cpp(c) 2339.080 2435.675 2478.5385 2526.8350 3535.637 100
bubble_order_cpp2(c) 219.752 231.977 234.5430 241.1840 322.383 100
sort(c) 59.467 64.749 68.2205 75.4645 148.815 100
c[order(c)] 38.336 41.204 44.3735 48.1460 93.878 100
另一个发现:订购比排序更快.
Another finding: It's faster to order than to sort.
好吧,那么对于较短的向量:
Well, then for shorter vectors:
c=sample(1:100)
microbenchmark(order(c),order_cpp(c),bubble_order_cpp2(c),sort(c),c[order(c)])
Unit: microseconds
expr min lq median uq max neval
order(c) 10.566 11.4710 12.8300 14.1880 63.089 100
order_cpp(c) 95.689 100.8200 102.7825 107.3105 198.018 100
bubble_order_cpp2(c) 9.962 11.1700 12.0750 13.2830 64.598 100
sort(c) 39.242 41.5065 42.5620 46.3355 155.758 100
c[order(c)] 11.773 12.6790 13.5840 15.9990 82.710 100
哦,好吧,我忽略了一个 RcppArmadillo 函数:
Oh well, I have overlooked an RcppArmadillo function:
// [[Rcpp::export]]
Rcpp::NumericVector ordera(arma::vec x) {
return(Rcpp::as<Rcpp::NumericVector>(Rcpp::wrap(arma::sort_index( x )+1)) );
}
microbenchmark(order(c),order_(c),ordera(c))
Unit: microseconds
expr min lq median uq max neval
order(c) 9.660 11.169 11.773 12.377 46.185 100
order_(c) 4.529 5.133 5.736 6.038 34.413 100
ordera(c) 4.227 4.830 5.434 6.038 60.976 100
推荐答案
这是一个利用 Rcpp 糖实现 order
功能的简单版本.我们检查了重复项,以确保事情按预期"工作.(当存在 NA
时,Rcpp 的 sort
方法也有一个错误,因此可能也需要检查——这将在下一个版本中修复).
Here's a simple version leveraging Rcpp sugar to implement an order
function. We put in a check for duplicates so that we guarantee that things work 'as expected'. (There is also a bug with Rcpp's sort
method when there are NA
s, so that may want to be checked as well -- this will be fixed by the next release).
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector order_(NumericVector x) {
if (is_true(any(duplicated(x)))) {
Rf_warning("There are duplicates in 'x'; order not guaranteed to match that of R's base::order");
}
NumericVector sorted = clone(x).sort();
return match(sorted, x);
}
/*** R
library(microbenchmark)
x <- runif(1E5)
identical( order(x), order_(x) )
microbenchmark(
order(x),
order_(x)
)
*/
给我
> Rcpp::sourceCpp('~/test-order.cpp')
> set.seed(456)
> library(microbenchmark)
> x <- runif(1E5)
> identical( order(x), order_(x) )
[1] TRUE
> microbenchmark(
+ order(x),
+ order_(x)
+ )
Unit: milliseconds
expr min lq median uq max neval
order(x) 15.48007 15.69709 15.86823 16.21142 17.22293 100
order_(x) 10.81169 11.07167 11.40678 11.87135 48.66372 100
>
当然,如果您对与 R 不匹配的输出感到满意,您可以删除重复的检查——x[order_(x)]
仍然会被正确排序;更具体地说,all(x[order(x)] == x[order_(x)])
应该返回 TRUE
.
Of course, if you're comfortable with the output not matching R, you can remove the duplicated check -- x[order_(x)]
will still be properly sorted; more specifically, all(x[order(x)] == x[order_(x)])
should return TRUE
.
这篇关于Rcpp 中的排序排列,即 base::order()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!