在Stata中进行加权热甲板插补的简单方法? [英] Simple way to do a weighted hot deck imputation in Stata?

查看：62 发布时间：2021/7/14 20:38:11 sas stata imputation

本文介绍了在Stata中进行加权热甲板插补的简单方法?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在 Stata 中做一个简单的加权热套牌插补.在 SAS 中，等效命令如下(请注意，这是一个较新的 SAS 功能，从 2015 年左右的 SAS/STAT 14.1 开始):

I'd like to do a simple weighted hot deck imputation in Stata. In SAS the equivalent command would be the following (and note that this is a newer SAS feature, beginning with SAS/STAT 14.1 in 2015 or so):

proc surveyimpute method=hotdeck(selection=weighted);

为了清楚起见，基本要求是:

For clarity then, the basic requirements are:

插补大多是基于行的或同时进行的.如果第 1 行捐赠 x 给第 3 行，那么它也必须捐赠 y.

Imputations most be row-based or simultaneous. If row 1 donates x to row 3, then it must also donate y.

必须考虑权重.权重=2的捐赠者被选中的可能性应该是权重=1的捐赠者的两倍

Must account for weights. A donor with weight=2 should be twice as likely to be selected as a donor with weight=1

我假设缺失的数据是矩形的.换句话说，如果潜在缺失的变量集由 x 和 y 组成，那么要么两者都缺失，要么都不缺失.这是生成示例数据的一些代码.

I'm assuming the missing data is rectangular. In other words, if the set of potentially missing variables consists of x and y then either both are missing or neither is missing. Here's some code to generate sample data.

global miss_vars "wealth income"
global weight    "weight"

set obs 6
gen id = _n
gen type = id > 3
gen income = 5000 * _n
gen wealth = income * 4 + 500 * uniform()
gen weight = 1
replace weight = 4 if mod(id-1,3) == 0

// set income & wealth missing every 3 rows
gen impute = mod(_n,3) == 0
foreach v in $miss_vars {
    replace `v' = . if impute == 1
}

数据如下所示:

            id       type     income     wealth     weight     impute
  1.         1          0       5000   20188.03          4          0
  2.         2          0      10000   40288.81          1          0
  3.         3          0          .          .          1          1
  4.         4          1      20000   80350.85          4          0
  5.         5          1      25000   100378.8          1          0
  6.         6          1          .          .          1          1

所以换句话说，我们需要随机(带权重)为每行有缺失值的相同类型观察选择捐赠者，并使用该捐赠者填写收入和财富值.在实际使用中，类型变量的生成当然是它自己的问题，但我在这里保持非常简单以关注主要问题.

So in other words, we need to randomly (with weighting) select a donor of the same type observation for each row with missing values and use that donor to fill in both income and wealth values. In practical use the generation of the type variable is of course it's own problem, but I'm keeping that very simple here to focus on the main issue.

例如，第 3 行可能看起来像以下后热甲板中的任何一个(因为它填充了第 1 行或第 2 行的收入和财富(但相比之下，永远不会从第 1 行获取收入和第 2 行的财富)):

For example, row 3 might look like either of the following post hotdeck (because it fills both income and wealth from row 1, or from row 2 (but in contrast would never take income from row 1 and the wealth from row 2):

  3.         3          0       5000   20188.03          1          1
  3.         3          0      10000   40288.81          1          1

此外，由于第 1 行的权重=4，第 2 行的权重=1，因此第 1 行应该是 80% 的时间是供体，第 2 行应该是 20% 的时间是供体.

Also, since row 1 has weight=4 and row 2 has weight=1, row 1 should be the donor 80% of the time and row 2 should be the donor 20% of the time.

在Stata中进行加权热甲板插补的简单方法? [英] Simple way to do a weighted hot deck imputation in Stata?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在Stata中进行加权热甲板插补的简单方法? [英] Simple way to do a weighted hot deck imputation in Stata?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭