在Stata中进行加权热甲板插补的简单方法? [英] Simple way to do a weighted hot deck imputation in Stata?

查看:62
本文介绍了在Stata中进行加权热甲板插补的简单方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在 Stata 中做一个简单的加权热套牌插补.在 SAS 中,等效命令如下(请注意,这是一个较新的 SAS 功能,从 2015 年左右的 SAS/STAT 14.1 开始):

I'd like to do a simple weighted hot deck imputation in Stata. In SAS the equivalent command would be the following (and note that this is a newer SAS feature, beginning with SAS/STAT 14.1 in 2015 or so):

proc surveyimpute method=hotdeck(selection=weighted); 

为了清楚起见,基本要求是:

For clarity then, the basic requirements are:

  1. 插补大多是基于行的或同时进行的.如果第 1 行捐赠 x 给第 3 行,那么它也必须捐赠 y.

  1. Imputations most be row-based or simultaneous. If row 1 donates x to row 3, then it must also donate y.

必须考虑权重.权重=2的捐赠者被选中的可能性应该是权重=1的捐赠者的两倍

Must account for weights. A donor with weight=2 should be twice as likely to be selected as a donor with weight=1

我假设缺失的数据是矩形的.换句话说,如果潜在缺失的变量集由 xy 组成,那么要么两者都缺失,要么都不缺失.这是生成示例数据的一些代码.

I'm assuming the missing data is rectangular. In other words, if the set of potentially missing variables consists of x and y then either both are missing or neither is missing. Here's some code to generate sample data.

global miss_vars "wealth income"
global weight    "weight"

set obs 6
gen id = _n
gen type = id > 3
gen income = 5000 * _n
gen wealth = income * 4 + 500 * uniform()
gen weight = 1
replace weight = 4 if mod(id-1,3) == 0

// set income & wealth missing every 3 rows
gen impute = mod(_n,3) == 0
foreach v in $miss_vars {
    replace `v' = . if impute == 1
}

数据如下所示:

            id       type     income     wealth     weight     impute
  1.         1          0       5000   20188.03          4          0
  2.         2          0      10000   40288.81          1          0
  3.         3          0          .          .          1          1
  4.         4          1      20000   80350.85          4          0
  5.         5          1      25000   100378.8          1          0
  6.         6          1          .          .          1          1

所以换句话说,我们需要随机(带权重)为每行有缺失值的相同类型观察选择捐赠者,并使用该捐赠者填写收入和财富值.在实际使用中,类型变量的生成当然是它自己的问题,但我在这里保持非常简单以关注主要问题.

So in other words, we need to randomly (with weighting) select a donor of the same type observation for each row with missing values and use that donor to fill in both income and wealth values. In practical use the generation of the type variable is of course it's own problem, but I'm keeping that very simple here to focus on the main issue.

例如,第 3 行可能看起来像以下后热甲板中的任何一个(因为它填充了第 1 行或第 2 行的收入和财富(但相比之下,永远不会从第 1 行获取收入和第 2 行的财富)):

For example, row 3 might look like either of the following post hotdeck (because it fills both income and wealth from row 1, or from row 2 (but in contrast would never take income from row 1 and the wealth from row 2):

  3.         3          0       5000   20188.03          1          1
  3.         3          0      10000   40288.81          1          1

此外,由于第 1 行的权重=4,第 2 行的权重=1,因此第 1 行应该是 80% 的时间是供体,第 2 行应该是 20% 的时间是供体.

Also, since row 1 has weight=4 and row 2 has weight=1, row 1 should be the donor 80% of the time and row 2 should be the donor 20% of the time.

推荐答案

在 Stata 中似乎没有办法做到这一点,也没有社区贡献的命令.有社区贡献的命令可以执行 hotdeck(特别是 hotdeck、whotdeck、hotdeckvar),但它们都没有处理样本权重.whotdeck 命令表面上似乎处理权重,但这些不是样本权重,而是内部估计的重要性权重.

It appears there was no way to do this in Stata nor were there community-contributed commands either. There were community-contributed commands that did hotdecks (specifically, hotdeck, whotdeck, and hotdeckvar) but none of them handled sample weights. The whotdeck command superficially appeared to handle weights, but these are not sample weights but rather internally estimated importance weights.

于是我自己写了一个程序上传到github.它被称为 wtd_hotdeck.请点击该链接了解更多信息和任何后续更新.

Hence I wrote a program myself and uploaded to github. It is called wtd_hotdeck. Please follow that link for more information and any subsequent updates.

这篇关于在Stata中进行加权热甲板插补的简单方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆