无需替换的采样算法? [英] Algorithm for sampling without replacement?

查看:18
本文介绍了无需替换的采样算法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试测试偶然发生特定数据聚类的可能性.一种稳健的方法是蒙特卡罗模拟,其中数据和组之间的关联被随机重新分配大量(例如 10,000),并且使用聚类度量将实际数据与模拟进行比较以确定 ap价值.

I am trying to test the likelihood that a particular clustering of data has occurred by chance. A robust way to do this is Monte Carlo simulation, in which the associations between data and groups are randomly reassigned a large number of times (e.g. 10,000), and a metric of clustering is used to compare the actual data with the simulations to determine a p value.

我已经完成了大部分工作,指针将分组映射到数据元素,所以我计划随机重新分配指向数据的指针.问题:什么是无需替换的快速采样方法,以便在复制数据集中随机重新分配每个指针?

I've got most of this working, with pointers mapping the grouping to the data elements, so I plan to randomly reassign pointers to data. THE QUESTION: what is a fast way to sample without replacement, so that every pointer is randomly reassigned in the replicate data sets?

例如(这些数据只是一个简化的例子):

For example (these data are just a simplified example):

数据(n=12 个值)- A 组:0.1、0.2、0.4/B 组:0.5、0.6、0.8/C 组:0.4、0.5/D 组:0.2、0.2、0.3、0.5

Data (n=12 values) - Group A: 0.1, 0.2, 0.4 / Group B: 0.5, 0.6, 0.8 / Group C: 0.4, 0.5 / Group D: 0.2, 0.2, 0.3, 0.5

对于每个复制数据集,我将拥有相同的集群大小(A=3、B=3、C=2、D=4)和数据值,但会将这些值重新分配给集群.

For each replicate data set, I would have the same cluster sizes (A=3, B=3, C=2, D=4) and data values, but would reassign the values to the clusters.

为此,我可以生成 1-12 范围内的随机数,分配 A 组的第一个元素,然后生成 1-11 范围内的随机数并分配 A 组中的第二个元素,依此类推.指针重新分配速度很快,我会预先分配所有数据结构,但是没有替换的采样似乎是一个以前可能已经解决过很多次的问题.

To do this, I could generate random numbers in the range 1-12, assign the first element of group A, then generate random numbers in the range 1-11 and assign the second element in group A, and so on. The pointer reassignment is fast, and I will have pre-allocated all data structures, but the sampling without replacement seems like a problem that might have been solved many times before.

首选逻辑或伪代码.

推荐答案

查看我对这个问题的回答 O(1) 中的唯一(非重复)随机数?.相同的逻辑应该可以完成您想要做的事情.

See my answer to this question Unique (non-repeating) random numbers in O(1)?. The same logic should accomplish what you are looking to do.

这篇关于无需替换的采样算法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆