如何随机分配给不同大小的组 [英] How to Randomly Assign to Groups of Different Sizes

查看:189
本文介绍了如何随机分配给不同大小的组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有一个数据集,我想将观察值分配给不同的组,组的大小由数据确定.例如,假设这是数据:

Say I have a dataset and I want to assign observations to different groups, the size of groups determined by the data. For example, suppose that this is the data:

sysuse census, clear
keep state region pop
order state pop region
decode region, gen(reg)
replace reg="NCntrl" if reg=="N Cntrl"
drop region
*Create global with regions
global region NE NCntrl South West
*Count the number in each region
bys reg (pop): gen reg_N=_N
tab reg

有四个reg组,大小各不相同.现在,我想将观察值随机分配给这四个组.这是通过生成一个随机数,然后根据随机数将观察值分配给其中一个组来实现的.

There are four reg groups, all of different sizes. Now, I want to randomly assign observations to the four groups. This is accomplished below by generating a random number and then assigning observations to one of the groups based on the random number.

*Generate random number
set seed 1
gen random = runiform()
sort random
*Assign observations to number based on random sorting
egen reg_rand = seq(), from(1) to (4)
*Map number to region
gen reg_new = ""
global count 1
foreach i in $region {
    replace reg_new = "`i'" if reg_rand==$count
    global count = $count + 1
}
bys reg_new: gen reg_new_N = _N
tab reg_new

但这不是我想要的.我不想使用seq()命令创建相同大小的组(假设N除以组数为整数),而是根据原始组的大小随机分配.在这种情况下,它等效于reg_N.例如,将有12个观测值的reg_new值为NCntrl.

This is not what I want, though. Instead of using the seq() command, which creates groups of equal sizes (assuming N divided by number of groups is a whole number), I would like to randomly assign based on the size of the original groups. In this case, that is equivalent to reg_N. For example, there would be 12 observations that have a reg_new value of NCntrl.

我可能有一个类似于

I might have one solution similar to https://stats.idre.ucla.edu/stata/faq/how-can-i-randomly-assign-observations-to-groups-in-stata/. The idea would be to save the results of tab reg into a macro or matrix, and then use a loop and replace to cycle through the observations, which are sorted by a random number. Assume that there are many, many more groups than the four in this toy example. Is there a more reasonable way to accomplish this?

推荐答案

您似乎希望在各个观察值之间随机存储组变量中存储的值.您可以通过将数据简化为组变量,对包含随机值的变量进行排序,然后使用不匹配的合并将随机组标识符与原始观测值相关联,来实现此目的. 假设数据示例存储在一个名为"data_example.dta"的文件中,并且当前已加载到内存中,则它看起来像:

It looks like you want to shuffle around the values stored in a group variable across observations. You can do this by reducing the data to the group variable, sorting on a variable that contains random values and then using an unmatched merge to associate the random group identifiers to the original observations. Assuming that the data example is stored in a file called "data_example.dta" and is currently loaded into memory, this would look like:

set seed 234
keep reg
rename reg reg_new
gen double u = runiform()
sort u reg_new
merge 1:1 _n  using "data_example.dta", nogen

tab reg reg_new

这篇关于如何随机分配给不同大小的组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆