在R中没有替换或重复的样品 [英] Sample without replacement, or duplicates, in R

查看：12 发布时间：2022/4/3 15:48:24 r permutation random-sample

本文介绍了在R中没有替换或重复的样品的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个很长的列表，其中包含相当多的重复项，例如100,000个值，其中20%是重复项。我想从这个列表中随机抽样，将所有值分成组，比如400个。但是，我不希望任何后续组中包含重复值-即，我希望每个组的所有250个成员都是唯一的。

我尝试了素食、Picante、EcoSimR的各种排列方法，但它们不能完全满足我的要求，或者似乎难以处理大量数据。

我想知道是否有一些我无法理解的使用Sample函数的方法？如有任何帮助或替代建议，我们将不胜感激...

推荐答案

如nico所述，您可能只需要使用unique函数。下面是一个非常简单的抽样程序，它可以确保不会出现跨组的重复(这并不完全明智，因为您可以只创建一个大样本...)

# Getting some random values to use here
set.seed(seed = 14412)
thevalues <- sample(x = 1:100,size = 1000,replace = TRUE)

# Obtaining the unique vector of those values
thevalues.unique <- unique(thevalues)

# Create a sample without replacement (i.e. take the ball out and don't put it back in)
sample1 <- sample(x = thevalues.unique,size = 10,replace = FALSE)

# Remove the sampled items from the vector of values
thevalues.unique <- thevalues.unique[!(thevalues.unique %in% sample1)]

# Another sample, and another removal
sample2 <- sample(x = thevalues.unique,size = 10,replace = FALSE)
thevalues.unique <- thevalues.unique[!(thevalues.unique %in% sample2)]

要做eipi10提到的事情并获得加权分布，您只需首先获得分布的频率。做到这一点的方法：

set.seed(seed = 14412)
thevalues <- sample(x = 1:100,size = 1000,replace = TRUE,prob = c(rep(0.01,100)))

thevalues.unique <- unique(thevalues)
thevalues.unique <- thevalues.unique[order(thevalues.unique)]
thevalues.probs <- table(thevalues)/length(thevalues)
sample1 <- sample(x = thevalues.unique,
                  size = 10,
                  replace = FALSE,
                  prob = thevalues.probs)

这篇关于在R中没有替换或重复的样品的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在R中没有替换或重复的样品 [英] Sample without replacement, or duplicates, in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在R中没有替换或重复的样品 [英] Sample without replacement, or duplicates, in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭