SAS中使用滞后的向前-向后方法填充缺失值 [英] Filling in missing values with forward-backward method with lag in SAS

查看:1189
本文介绍了SAS中使用滞后的向前-向后方法填充缺失值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设您有一个包含用户名,计数器和每个计数器得分的表.

Assume that you have a table with user name, counter and score for each counter.

data have;
input user $  counter  score;
cards;
A 1 .
A 2 .
A 3 40
A 4 .
A 5 20
A 6 .
B 1 30
B 2 .
C 1 .
C 2 .
C 3 .
;
run;

某些计数器之间缺少某些分数,并且您希望放置与先前计数器相同的分数.因此结果将如下所示:

Some scores are missing beween some counters, and you want to put the same score as previous counter. So the result will look like below:

A 1 40
A 2 40
A 3 40
A 4 40
A 5 20
A 6 20
B 1 30
B 2 30
C 1 .
C 2 .
C 3 .

我设法通过使用如下的lag函数来填补缺少的得分值:

I managed to fill the missing score values forward by using the lag function like below:

data result1a;
  set have(keep=user);
  by user;

  *Look ahead;
    merge have have(firstobs=2 keep=score rename=(score=_NextScore));

    if first.user then do;
        if score= . then score=_NextScore;
        end;
    else do;
        _PrevScore = lag(score);
        if score= . then score=_PrevScore;
    end;
    output;
run;

然后我通过在counter上使用descending功能对表进行反向排序,如下所示:

Then I sorted the table backward by using descending funtion on counter like below:

proc sort data = result1a out= result1b; 
by user descending counter ;
run;

然后,最后,我将再次使用lag函数,将丢失的值向前填充到raaranged表中(根据初始表向后填充),如下所示.

Then finally I would fill the missing values forward in raaranged table (going backward according to the initial table) by using the lag function again like below.

我在do-loop中使用了lag函数,因为我想在每个步骤中更新先前的值(例如,值40将一直从组中的第一个分数一直携带到最后一个分数).

I used the lag function in do-loop, because I wanted to update the previous value in each step (For example, the value 40 would be carried from the first score to the last score in the group all the way).

但是,我得到的结果很奇怪.所有缺失的值都不会获得真正的价值.关于修复最后一个数据步骤有什么想法吗?

However, I get strange result. All missing values don't geta real value. Any idea about fixing the last data-step?

data result1c;
set result1b;
by user;

   if first.user then do;
        if score= . then score=_NextScore;
        else score = score;

        end;
   else do;
        _PrevScore = lag(score);
        if score= . then 
        score=_PrevScore;
        else score = score;
   end;
   output;
run;

推荐答案

不需要使用lag,也可以使用retain(或等价物).这是一个双重DoW循环解决方案,可在一个数据步骤中完成此操作(并且有效地执行一次读取-它缓冲读取操作,因此其效率与单次读取一样.)

Don't need to use lag, use retain (or equivalent). Here's a double DoW loop solution that does it in one datastep (and, effectively, one read - it buffers the read so this is as efficient as a single read).

首先,我们遍历数据集以获得找到的第一个分数,因此我们可以将其获取为初始prev_score值.然后进行设置,然后为该用户重新遍历各行并输出.因为我自己做循环,所以这里没有实际的retain,但这类似于是否有retain prev_score;,这是正常的数据步进循环.我实际上不是retain它,因为我希望它在遇到新用户时会丢失.

First we loop through the dataset to get the first score found, so we can grab that for the initial prev_score value. Then setting that, and re-looping through the rows for that user and outputting. There's no actual retain here since I am doing the looping myself, but it's similar to if there were a retain prev_score; and this was a normal data step loop. I don't actually retain it since I want it to go missing when a new user is met.

data want;
  do _n_ = 1 by 1 until (last.user);
    set have;
    by user;
    if missing(first_score) and not missing(score) then 
      first_score = score;

  end;
  prev_score = first_score;
  do _n_ = 1 by 1 until (last.user);
    set have;
    by user;
    if missing(score) then
      score = prev_score;
    prev_score = score;
    output;
  end;
run;

这篇关于SAS中使用滞后的向前-向后方法填充缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆