如何在R中按状态创建分层样本 [英] How to create a stratified sample by state in R

查看:96
本文介绍了如何在R中按状态创建分层样本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用采样"包在R中创建分层样本?我的数据集有355,000个观测值.该代码可以正常工作到最后一行.下面是我编写的代码,但我总是收到以下消息:"sort.list(y)中的错误:'x'对于'sort.list'必须是原子的.您在列表上调用过'sort'吗?"

请不要让我指向Stackoverflow上的旧消息.我研究了它们,但无法使用它们.谢谢.

## lpdata file has 355,000 observations
# Exclude Puerto Rico, Virgin Islands and Guam
sub.lpdata<-subset(lpdata,"STATE" != 'PR' | "STATE" != 'VI' | "STATE" != 'GU')

## Create a 10% sample, stratified by STATE
sort.lpdata<-sub.lpdata[order(sub.lpdata$STATE),]
tab.state<-data.frame(table(sort.lpdata$STATE))
size.strata<-as.vector(round(ceiling(tab.state$Freq)*0.1))

s<-strata(sort.lpdata,stratanames=sort.lpdata$STATE,size=size.strata,method="srswor")}

解决方案

在不了解strata函数的情况下,可能需要一些编码才能实现:

d <- expand.grid(id = 1:35000, stratum = letters[1:10])

p = 0.1

dsample <- data.frame()

system.time(
for(i in levels(d$stratum)) {
  dsub <- subset(d, d$stratum == i)
  B = ceiling(nrow(dsub) * p)
  dsub <- dsub[sample(1:nrow(dsub), B), ]
  dsample <- rbind(dsample, dsub) 
  }
)

# size per stratum in resulting df is 10 % of original size:
table(dsample$stratum)

HTH, 凯

ps:我的笔记本电脑上的CPU时间为0.09!

How can I create a stratified sample in R using the "sampling" package? My dataset has 355,000 observations. The code works fine up to the last line. Below is the code I wrote, but I always get the following message: "Error in sort.list(y) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list?"

Please do not point me to older messages on Stackoverflow. I researched them, but have not been able to use them. Thank you.

## lpdata file has 355,000 observations
# Exclude Puerto Rico, Virgin Islands and Guam
sub.lpdata<-subset(lpdata,"STATE" != 'PR' | "STATE" != 'VI' | "STATE" != 'GU')

## Create a 10% sample, stratified by STATE
sort.lpdata<-sub.lpdata[order(sub.lpdata$STATE),]
tab.state<-data.frame(table(sort.lpdata$STATE))
size.strata<-as.vector(round(ceiling(tab.state$Freq)*0.1))

s<-strata(sort.lpdata,stratanames=sort.lpdata$STATE,size=size.strata,method="srswor")}

解决方案

Without knowing of the strata function - a bit of coding might do what want:

d <- expand.grid(id = 1:35000, stratum = letters[1:10])

p = 0.1

dsample <- data.frame()

system.time(
for(i in levels(d$stratum)) {
  dsub <- subset(d, d$stratum == i)
  B = ceiling(nrow(dsub) * p)
  dsub <- dsub[sample(1:nrow(dsub), B), ]
  dsample <- rbind(dsample, dsub) 
  }
)

# size per stratum in resulting df is 10 % of original size:
table(dsample$stratum)

HTH, Kay

ps: CPU time on my relict laptop is 0.09!

这篇关于如何在R中按状态创建分层样本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆