在 SAS 中使用具有滞后的前向后向方法填充缺失值 [英] Filling in missing values with forward-backward method with lag in SAS
问题描述
假设您有一个表,其中包含用户名、计数器和每个计数器的分数.
Assume that you have a table with user name, counter and score for each counter.
data have;
input user $ counter score;
cards;
A 1 .
A 2 .
A 3 40
A 4 .
A 5 20
A 6 .
B 1 30
B 2 .
C 1 .
C 2 .
C 3 .
;
run;
某些计数器之间缺少某些分数,您想输入与前一个计数器相同的分数.所以结果将如下所示:
Some scores are missing beween some counters, and you want to put the same score as previous counter. So the result will look like below:
A 1 40
A 2 40
A 3 40
A 4 40
A 5 20
A 6 20
B 1 30
B 2 30
C 1 .
C 2 .
C 3 .
我设法通过使用 lag
函数来填补缺失的分数值,如下所示:
I managed to fill the missing score values forward by using the lag
function like below:
data result1a;
set have(keep=user);
by user;
*Look ahead;
merge have have(firstobs=2 keep=score rename=(score=_NextScore));
if first.user then do;
if score= . then score=_NextScore;
end;
else do;
_PrevScore = lag(score);
if score= . then score=_PrevScore;
end;
output;
run;
然后我使用 counter
上的 descending
函数对表格进行向后排序,如下所示:
Then I sorted the table backward by using descending
funtion on counter
like below:
proc sort data = result1a out= result1b;
by user descending counter ;
run;
最后,我将再次使用 lag
函数在 raarranged 表中向前填充缺失值(根据初始表向后),如下所示.
Then finally I would fill the missing values forward in raaranged table (going backward according to the initial table) by using the lag
function again like below.
我在 do-loop
中使用了 lag
函数,因为我想在每一步更新之前的值(例如,值 40 将从一路第一分到最后一分).
I used the lag
function in do-loop
, because I wanted to update the previous value in each step (For example, the value 40 would be carried from the first score to the last score in the group all the way).
但是,我得到了奇怪的结果.所有缺失值都没有真正的价值.关于修复最后一个数据步骤的任何想法?
However, I get strange result. All missing values don't geta real value. Any idea about fixing the last data-step?
data result1c;
set result1b;
by user;
if first.user then do;
if score= . then score=_NextScore;
else score = score;
end;
else do;
_PrevScore = lag(score);
if score= . then
score=_PrevScore;
else score = score;
end;
output;
run;
推荐答案
不需要使用lag
,使用retain
(或等效).这是一个双 DoW 循环解决方案,它在一个数据步中完成(实际上,一次读取 - 它缓冲读取,因此这与单次读取一样有效).
Don't need to use lag
, use retain
(or equivalent). Here's a double DoW loop solution that does it in one datastep (and, effectively, one read - it buffers the read so this is as efficient as a single read).
首先,我们遍历数据集以获取找到的第一个分数,因此我们可以将其作为初始 prev_score 值.然后设置它,并重新循环该用户的行并输出.这里没有实际的 retain
因为我自己在循环,但它类似于如果有一个 retain prev_score;
这是一个正常的数据步骤循环.我实际上并没有保留
它,因为我希望它在遇到新用户时丢失.
First we loop through the dataset to get the first score found, so we can grab that for the initial prev_score value. Then setting that, and re-looping through the rows for that user and outputting. There's no actual retain
here since I am doing the looping myself, but it's similar to if there were a retain prev_score;
and this was a normal data step loop. I don't actually retain
it since I want it to go missing when a new user is met.
data want;
do _n_ = 1 by 1 until (last.user);
set have;
by user;
if missing(first_score) and not missing(score) then
first_score = score;
end;
prev_score = first_score;
do _n_ = 1 by 1 until (last.user);
set have;
by user;
if missing(score) then
score = prev_score;
prev_score = score;
output;
end;
run;
这篇关于在 SAS 中使用具有滞后的前向后向方法填充缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!