在 SAS 中使用具有滞后的前向后向方法填充缺失值 [英] Filling in missing values with forward-backward method with lag in SAS

查看:77
本文介绍了在 SAS 中使用具有滞后的前向后向方法填充缺失值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设您有一个表,其中包含用户名、计数器和每个计数器的分数.

Assume that you have a table with user name, counter and score for each counter.

data have;
input user $  counter  score;
cards;
A 1 .
A 2 .
A 3 40
A 4 .
A 5 20
A 6 .
B 1 30
B 2 .
C 1 .
C 2 .
C 3 .
;
run;

某些计数器之间缺少某些分数,您想输入与前一个计数器相同的分数.所以结果将如下所示:

Some scores are missing beween some counters, and you want to put the same score as previous counter. So the result will look like below:

A 1 40
A 2 40
A 3 40
A 4 40
A 5 20
A 6 20
B 1 30
B 2 30
C 1 .
C 2 .
C 3 .

我设法通过使用 lag 函数来填补缺失的分数值,如下所示:

I managed to fill the missing score values forward by using the lag function like below:

data result1a;
  set have(keep=user);
  by user;

  *Look ahead;
    merge have have(firstobs=2 keep=score rename=(score=_NextScore));

    if first.user then do;
        if score= . then score=_NextScore;
        end;
    else do;
        _PrevScore = lag(score);
        if score= . then score=_PrevScore;
    end;
    output;
run;

然后我使用 counter 上的 descending 函数对表格进行向后排序,如下所示:

Then I sorted the table backward by using descending funtion on counter like below:

proc sort data = result1a out= result1b; 
by user descending counter ;
run;

最后,我将再次使用 lag 函数在 raarranged 表中向前填充缺失值(根据初始表向后),如下所示.

Then finally I would fill the missing values forward in raaranged table (going backward according to the initial table) by using the lag function again like below.

我在 do-loop 中使用了 lag 函数,因为我想在每一步更新之前的值(例如,值 40 将从一路第一分到最后一分).

I used the lag function in do-loop, because I wanted to update the previous value in each step (For example, the value 40 would be carried from the first score to the last score in the group all the way).

但是,我得到了奇怪的结果.所有缺失值都没有真正的价值.关于修复最后一个数据步骤的任何想法?

However, I get strange result. All missing values don't geta real value. Any idea about fixing the last data-step?

data result1c;
set result1b;
by user;

   if first.user then do;
        if score= . then score=_NextScore;
        else score = score;

        end;
   else do;
        _PrevScore = lag(score);
        if score= . then 
        score=_PrevScore;
        else score = score;
   end;
   output;
run;

推荐答案

不需要使用lag,使用retain(或等效).这是一个双 DoW 循环解决方案,它在一个数据步中完成(实际上,一次读取 - 它缓冲读取,因此这与单次读取一样有效).

Don't need to use lag, use retain (or equivalent). Here's a double DoW loop solution that does it in one datastep (and, effectively, one read - it buffers the read so this is as efficient as a single read).

首先,我们遍历数据集以获取找到的第一个分数,因此我们可以将其作为初始 prev_score 值.然后设置它,并重新循环该用户的行并输出.这里没有实际的 retain 因为我自己在循环,但它类似于如果有一个 retain prev_score; 这是一个正常的数据步骤循环.我实际上并没有保留它,因为我希望它在遇到新用户时丢失.

First we loop through the dataset to get the first score found, so we can grab that for the initial prev_score value. Then setting that, and re-looping through the rows for that user and outputting. There's no actual retain here since I am doing the looping myself, but it's similar to if there were a retain prev_score; and this was a normal data step loop. I don't actually retain it since I want it to go missing when a new user is met.

data want;
  do _n_ = 1 by 1 until (last.user);
    set have;
    by user;
    if missing(first_score) and not missing(score) then 
      first_score = score;

  end;
  prev_score = first_score;
  do _n_ = 1 by 1 until (last.user);
    set have;
    by user;
    if missing(score) then
      score = prev_score;
    prev_score = score;
    output;
  end;
run;

这篇关于在 SAS 中使用具有滞后的前向后向方法填充缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆