R中具有多个概率的随机样本 [英] Random Sample with multiple probabilities in R

查看:60
本文介绍了R中具有多个概率的随机样本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从列表中取出一组受试者样本,将他们指定为控制组,以便进行一项研究,该研究必须具有相似的变量组成.我正在尝试使用示例函数在 R 中执行此操作,但我不知道如何为每个变量指定不同的概率.假设我有一个带有以下标题的表格:

I need to get out a sample of subjects from a list to assign them as a Control Group for a study which has to have a similar composition of variables. I am trying to do this in R with the sample function but I don´t know how to specify the differetnt probabilities for each variable. Lets say I have a table with the following headers:

ID 名称广告系列性别

ID Name Campaign Gender

我需要一个包含 10 个主题的样本,其中包含以下广告系列属性的组成:

I need a sample of 10 subjects with the following composition of Campaign attributes:

D2D --> 25%

D2D --> 25%

F2F --> 38%

F2F --> 38%

TM --> 17%

全球 --> 21%

这意味着从我的数据集中,我有 25% 的主题来自门到门活动 (D2D),38% 来自面对面活动 (F2F) 等

This means from my data set I have 25% of subjects coming from a Door to Door Campaign (D2D), 38% from a Face to Face Campaign (F2F), etc

性别构成如下:

男性 --> 54%

女性 --> 46%

当我随机抽取 10 个对象时,我需要它具有相似的组成.

When I get a random sample of 10 subjects I need it to have a similar composition.

我已经搜索了几个小时,我能得到的最接近的答案是这个答案:在 R 中取数据样本但我需要分配多个概率.

I have been searching for hours and the closest I was able to get to anything similar was this answer: taking data sample in R but I need to assign more than one probability.

我相信这可以帮助任何想要从数据集中获取代表性样本的人.

I am sure that this could help anyone who wants to get a representative sample from a Data Set.

推荐答案

听起来您对随机分层样本感兴趣.您可以使用 survey 包中的 stratsample() 函数执行此操作.

It sounds like you are interested in taking a random stratified sample. You could do this using the stratsample() function from the survey package.

在下面的示例中,我创建了一些假数据来模拟您拥有的数据,然后我定义了一个函数来获取随机比例分层随机样本,然后将该函数应用于假数据.

In the example below, I create some fake data to mimic what you have, then I define a function to take a random proportional stratified random sample, then I apply the function to the fake data.

# example data
ndf <- 1000
df <- data.frame(ID=sample(ndf), Name=sample(ndf), 
    Campaign=sample(c("D2D", "F2F", "TM", "WW"), ndf, prob=c(0.25, 0.38, 0.17, 0.21), replace=TRUE),
    Gender=sample(c("Male", "Female"), ndf, prob=c(0.54, 0.46), replace=TRUE))

# function to take a random proportional stratified sample of size n
rpss <- function(stratum, n) {
    props <- table(stratum)/length(stratum)
    nstrat <- as.vector(round(n*props))
    nstrat[nstrat==0] <- 1
    names(nstrat) <- names(props)
    stratsample(stratum, nstrat)
    }

# take a random proportional stratified sample of size 10
selrows <- rpss(stratum=interaction(df$Campaign, df$Gender, drop=TRUE), n=10)
df[selrows, ]

这篇关于R中具有多个概率的随机样本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆