将向量的元素置换10,000次-有效吗? (右) [英] Permuting elements of a vector 10,000 times - efficiently? (R)

查看:72
本文介绍了将向量的元素置换10,000次-有效吗? (右)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题很简单.但是,我发现的解决方案在内存和时间上都非常低效.我想知道是否可以在R中完成而无需将机器磨成粉尘.

This question is quite straightforward. However, the solutions that I have found to it are extremely memory and time inefficient. I am wondering if this can be done in R without grinding one's machine into dust.

获取矢量:

x<-c("A", "B", "B", "E", "C", "C", "D", "E", "A', "C")

这有10个元素.有五个独特的元素.因此,重要的是,某些元素会重复,并且任何排列都应包含每种元素的相同总数.我希望将该序列/载体置换10,000次,每一个都是随机生成的且唯一的.利用我的真实数据,我可以对多达1000个元素进行这些排列.这可能很难高效地完成.

This one has 10 elements. There are five unique elements. Therefore, importantly, some elements are repeated and any permutation should contain the same total number of each type of element. I wish to permute this sequence/vector 10,000 times with each one being a randomly generated and unique one. With my real data, I could be doing these permutations for up to 1000 elements. This can be very hard to do efficiently.

要获得一个排列,您可以执行以下操作:

To get one permutation, you can just do:

sample(x)

或者,从gtools软件包中:

or, from the gtools package:

permute(x)

我可以编写一些代码来完成10,000次,但是很可能会重复.有没有办法做到这一点,并删除重复项直到达到10,000个?

I could write some code to do that 10,000 times, but am likely to have duplicates. Is there way of doing this and dropping duplicates until 10,000 is reached?

关于stackoverflow和statsoverflow的其他类似问题也提出了有关生成序列的所有唯一排列的问题.这些问题在这里:

Other similar questions on stackoverflow and statsoverflow have asked question about generating all the unique permutations of a sequence. These questions are here:

对向量进行改组-样本的所有可能结果()?

生成列表的所有不同排列在R

https://stats.stackexchange.com/questions /24300/如何在r中重新采样而不会重复排列

这些很好,生成所有唯一排列的建议也很棒,运行它们并从每个样本中抽取10,000个随机样本肯定很容易,以得到10,000个.但是,如果向量中的元素超过10个左右,那么它将占用大量内存.

These are good and the suggestions for generating all the unique permutations are great and it would certainly be quite easy to run them and sample 10,000 random samples from each to get our 10,000. However, if you go beyond about 10 elements in a vector then it gets very memory intensive.

任何有关如何有效地执行此操作的评论(最多可包含1000个元素)受到赞赏.这让我头晕目眩.

Any comments about how to do this efficiently for up to 1000 elements appreciated. This has me getting very dizzy.

推荐答案

我不认为计算应该像使它们那样昂贵.对于较小的"x"向量,您可能希望稍微过冲(在这里,我有点过头了),然后使用duplicated检查重复项.如果所需数量与重复行数量之间的差异太大而无法获得所需的10,000,请重复此过程以填补差异,使用rbind将要保留的数量添加到从中获得的矩阵中replicate.可以在while循环中实现.

I don't think that the computations should be as expensive as you are making them to be. For small "x" vectors, you might want to overshoot a little bit (here, I've sort of overdone it), then check for duplicates using duplicated. If the difference between the number required and the number of duplicated rows is too much for you to get your desired 10,000, repeat the process to fill the difference, using rbind to add the ones you want to keep to the matrix you get from replicate. This could be implemented in a while loop.

x <- c("A", "B", "B", "E", "C", "C", "D", "E", "A", "C")
set.seed(1)
N <- t(replicate(15000, sample(x)))
sum(duplicated(N))
# [1] 1389
out <- N[!(duplicated(N)), ][1:10000, ]
head(out)
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] "B"  "E"  "C"  "D"  "B"  "E"  "A"  "C"  "C"  "A"  
# [2,] "B"  "B"  "C"  "C"  "C"  "D"  "E"  "E"  "A"  "A"  
# [3,] "C"  "B"  "C"  "A"  "A"  "E"  "D"  "C"  "B"  "E"  
# [4,] "C"  "C"  "E"  "B"  "C"  "E"  "A"  "A"  "D"  "B"  
# [5,] "A"  "C"  "D"  "E"  "E"  "C"  "A"  "B"  "B"  "C"  
# [6,] "C"  "E"  "E"  "B"  "A"  "C"  "D"  "A"  "B"  "C"

据我所知,duplicated步骤实际上是最昂贵的:

The duplicated step is actually the most expensive, as far as I can see:

y <- sample(500, 1000, TRUE)
system.time(N <- t(replicate(12000, sample(y))))
# user  system elapsed 
# 2.35    0.08    2.43 
system.time(D <- sum(duplicated(N)))
#  user  system elapsed 
# 14.82    0.01   14.84 
D
# [1] 0

^^那里,我们的12,000个样本中没有重复样本.

^^ There, we have no duplicates in our 12,000 samples.

这篇关于将向量的元素置换10,000次-有效吗? (右)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆