在 R 中有效地创建向量的混乱 [英] Efficiently create derangement of a vector in R

查看:51
本文介绍了在 R 中有效地创建向量的混乱的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究一种在 R 中有效创建向量的混乱(以及相反的特定排列)的方法.就我所见,没有基本函数可以做到这一点,SO 上也没有太多相关内容.

I'm looking into a way of efficiently creating a derangement (and conversely specific permutations) of a vector in R. As far as I've seen, there's no base function that does that and also there's not much about it here on SO.

一个明显的开始是 sample,它创建了一个向量的排列.但是我需要这个排列没有固定点,因此是向量的混乱.有关此主题的很好解释,请参阅 此交叉验证帖子.

An obvious start is sample which creates a permutation of a vector. But I need this permutation to have no fixed points, hence be a derangement of the vector. For a nice explanation of this topic, see this Cross Validated post.

这是我的第一种方法:

derangr <- function(x){

  while(TRUE){

    xp <- sample(x)

     if(sum(xp == x) == 0) break

  }

  return(xp)

}

所以在 while 循环中,我正在检查向量 x 和给定的 x 排列之间是否有一个固定点,称为xp.如果没有,我打破循环并返回向量.

So within a while loop, I'm checking if there's a fixed point between a vector x and a given permutation of x called xp. If there is none, I break the loop and return the vector.

如结果所示,它工作正常:

As the results show, it works fine:

> derangr(1:10)
 [1]  4  5  6 10  7  2  1  9  3  8

> derangr(LETTERS)
 [1] "C" "O" "L" "J" "A" "I" "Y" "M" "G" "T" "S" "R" "Z" "V" "N" "K" "D" "Q" "B" "H" "F" "E" "X" "W" "U" "P"

所以我想知道是否有更好的方法来做到这一点,可能是将 while 替换为某种矢量化.我还想关注可扩展性.

So I'm wondering if there's a better way of doing that, potentially with substituting while by a vectorization of some kind. I also want to keep an eye on scalability.

这是两个示例的微基准:

library(microbenchmark)

> microbenchmark(derangr(1:10),times = 10000)
Unit: microseconds
          expr   min     lq    mean  median      uq      max neval
 derangr(1:10) 8.359 15.492 40.1807 28.3195 49.4435 6866.453 10000

> microbenchmark(derangr(LETTERS),times = 10000)
Unit: microseconds
             expr    min     lq     mean  median      uq      max neval
 derangr(LETTERS) 24.385 31.123 34.75819 32.4475 34.3225 10200.17 10000

同样的问题也适用于相反的情况,产生具有给定数量的固定点n的排列:

The same question applies to the converse, producing permutations with a given number of fixed points n:

arrangr <- function(x,n){

  while(TRUE){

    xp <- sample(x)

     if(sum(xp == x) == n) break
  }

  return(xp)

}

推荐答案

如果您没有唯一的值,您可以重新排列一个索引,例如并使用它以新顺序对输入向量进行子集化.在这种情况下,如果您有例如 rep(LETTERS, 2) 第一个 A 和第二个 A 将可以互换.Q 中提出的 derangr() 函数也会重新排列这些.

If you don't have only unique values, you could rearrange an index like and use it for subsetting the input vector in a new order. In this case if you have for example rep(LETTERS, 2) the first A and the second A would be interchangeable. The derangr() function proposed in the Q would also rearrange these.

derangr2 <- function(x){
  ind <- seq_along(x)
  while(TRUE){
    indp <- sample(ind)
    if(sum(indp == ind) == 0) break

  }
  return(x[indp])
}

一些基准测试结果:

microbenchmark(derangr(rep(LETTERS, 4)), 
               derangr2(rep(LETTERS, 4)), times = 1000)

# Unit: microseconds
#                      expr   min       lq       mean  median      uq      max neval
#  derangr(rep(LETTERS, 4)) 6.258 113.4895 441.831094 251.724 549.384 5837.143  1000
# derangr2(rep(LETTERS, 4)) 6.542   7.3960  23.173800  12.800  22.755 4645.936  1000

然而,如果您只面对独特的价值,这种方法并没有太大的改进.

However, if you face only unique values, this approach doesn't hold a lot of improvement.

microbenchmark(derangr(1:1000), derangr2(1:1000), times = 1000)
# Unit: microseconds
#             expr    min     lq     mean median      uq      max neval
#  derangr(1:1000) 19.341 21.333 61.55154 40.959 78.0775 2770.382  1000
# derangr2(1:1000) 23.608 25.884 72.76647 46.079 84.1930 2674.243  1000

这篇关于在 R 中有效地创建向量的混乱的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆