如何将第一个非缺失值写入第一个缺失的观察值 [英] How to write first non-missing value to first missing observations

查看:55
本文介绍了如何将第一个非缺失值写入第一个缺失的观察值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据集

id  rate
 1     .
 2     .
 3  0.01
 4  0.02
 5     .
 6     .

如何有效地将第一个非缺失率写入它之前的观测值,并将最后一个非缺失率写入它之后的观测值?所以:

How can I efficiently write the first non-missing rate to the observations before it and the last non-missing rate to the observations after it? So that:

id  rate
 1  0.01
 2  0.01
 3  0.01
 4  0.02
 5  0.02
 6  0.02

我可以填充最终值(见下面的代码),但我不知道如何做第一个 - 不求助于辅助表,在我的情况下可能(或不)使这非常低效,因为我将拥有数百万条记录.

I can fill the final values (see code below), but I have not got a clue on how to do the first ones - without resorting to auxiliary tables, which in my case may (or not) make this very inefficient, as I will have millions of records.

data have;
    input id rate;
    datalines;
     1     .
     2     .
     3  0.01
     4  0.02
     5     .
     6     .
run;

data want(drop=previous);
    set have;
    retain previous;
    if not nmiss(rate) then previous = rate;
    else rate = previous;
run;

推荐答案

我不确定最有效的答案是什么.您可能只想使用您拥有的方法填充最终值,然后对数据集进行反向排序并使用相同的方法填充其余值.但是,这是我想出的另一种方法,我不确定它的效率如何,但您可以尝试一下:

I'm not sure what the most efficient answer is. You may just want to fill in the final values with the method you have, then reverse sort the data set and fill in the rest of the values using the same method. However here is an alternative method I came up with, I'm not sure how efficient it is but you could try it:

data have;
input id group rate;
datalines;
 1  1    .
 2  1    .
 3  1 0.01
 4  1 0.02
 5  1    .
 6  1    .
 7  2    .
 8  2    .
 9  2    .
10  2 0.03
11  2 0.07
12  2    .
;

data want(drop=next initial);
    retain next;
    do until (rate ne . or last.group);
        set have;
        by group id;
    end;
    if rate ne . then next = rate;
    do until (initial ne . or last.group);
        set have;
        by group id;
        initial = rate;
        if initial = . then rate = next;
        output;
    end;
run;

我添加了一个组变量,因为您说要按组使用.程序在数据中循环两次,一次获得第一个非缺失率,然后将该率应用于缺失并输出到最终数据集.next"是缺失率被替换的值,initial"是每个观察的初始率.

I added a group variable since you said you would be working with by groups. The program cycles through the data twice, once to get the first non-missing rate, next to apply that rate to the missings and output to the final data set. "next" is the value missing rates are replaced with, and "initial" is the initial rate for each observation.

对于每个组,处理过程如下:第一个do until"循环读取数据集have",直到找到非缺失率.变量next"设置为等于该速率.第二个do until"循环读取have",设置rate"等于next",直到找到第一个非缺失率.然后两个循环读取非缺失率,第二个循环输出它们.最后一个非缺失率保存到下一个".然后循环遍历最后丢失的速率,用next"中的值替换它们并输出它们.

For each group the processing goes like this: The first "do until" loop reads through the data set "have" until it finds a non-missing rate. The variable "next" is set to equal that rate. The second "do until" loop reads through "have", setting "rate" equal to "next", until it finds the first non-missing rate. Then both loops read through the non-missing rates and the second loop outputs them. The last non-missing rate is saved to "next". Then the loops cycle through the missing rates at the end, replacing them with the value from "next" and outputting them.

这篇关于如何将第一个非缺失值写入第一个缺失的观察值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆