根据不同的组,用上一个或下一个观察值替换缺失值 [英] Replace missing value with previous or next observation, based on different groups

查看:39
本文介绍了根据不同的组,用上一个或下一个观察值替换缺失值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个示例数据集.下面的场景代表了一个账户交易历史列表:

  • 两个客户 A 和 B;
  • 客户 A 在银行 BA 和银行 UB 有 5 个账户 1、2、3、4、5;
  • 客户 B 在 WF 银行有 2 个账户 1、6;
  • 请注意,WF 银行已向 B 颁发了与 A 在银行 BA 相同的帐户 ID(这在现实中可能是真的).​​

    数据有;输入客户$ Account_id Bank_id $年;数据线;A 1 BA .一个 2 UB .A 3 UB 2012一个 3 UB .一个 4 UB .A 4 UB 2013一个 5 UB.B 1 WF 2014B 1 WF.B 6 WF.;

有些账户开户日期缺失,但对于同一个账户,开户日期应该是一样的.我想要做的是用以前/以后的值替换帐户的缺失日期(如果有);否则,保持为空.

 数据要;输入客户$ Account_id Bank_id $年;数据线;A 1 BA .一个 2 UB .A 3 UB 2012A 3 UB 2012A 4 UB 2013A 4 UB 2013一个 5 UB.B 1 WF 2014B 1 WF 2014B 6 WF.;

我看了这个例子:如何将第一个非缺失值写入第一个缺失的观察值,这很有帮助,但我无法调整它以适用于我的案例,它有多个组.

解决方案

您可以在单个 proc sql 步骤中完成此操作:

<前>进程 sql ;创建表 want2 作为选择 a.Customer, a.Account_id, a.Bank_id, max(a.Year,b.Year) 作为年份从有一个左连接(选择 Customer、Account_id、Bank_id、max(Year) 作为年份从有group by Customer, Account_id, Bank_id) b on a.Customer = b.Customer和 a.Account_id = b.Account_id和 a.Bank_id = b.Bank_id ;放弃 ;

Here is a sample dataset. The scenario below represents a list of account transaction history:

  • Two customers A and B;
  • Customer A has 5 accounts 1, 2, 3, 4, 5 at Bank BA and Bank UB;
  • Customer B has 2 accounts 1, 6 at Bank WF;
  • Note that Bank WF has issued an account ID to B that's same as A has at Bank BA (which could be true in reality).

    data have;
     input Customer $ Account_id Bank_id $ Year;
    datalines;
    A  1  BA    .
    A  2  UB    .
    A  3  UB 2012
    A  3  UB    .
    A  4  UB    .
    A  4  UB 2013
    A  5  UB    .
    B  1  WF 2014
    B  1  WF    .
    B  6  WF    .
    ;
    

Some of the account open dates are missing, but for the same account, the open date should be the same. What I'm trying to do is to replace the missing date of an account with its previous/later value, if there's any; otherwise, keep it null.

    data want;
     input Customer $ Account_id Bank_id $ Year;
    datalines;
    A  1  BA    .
    A  2  UB    .
    A  3  UB 2012
    A  3  UB 2012
    A  4  UB 2013
    A  4  UB 2013
    A  5  UB    .
    B  1  WF 2014
    B  1  WF 2014
    B  6  WF    .
    ;

I looked at this example: How to write first non-missing value to first missing observations which was helpful, but I can't tweak it to work for my case, which has multiple groups.

解决方案

You can do this in a single proc sql step :

proc sql ;
  create table want2 as
  select a.Customer, a.Account_id, a.Bank_id, max(a.Year,b.Year) as Year
  from have a
       left join
       (select Customer, Account_id, Bank_id, max(Year) as Year
        from have
        group by Customer, Account_id, Bank_id) b on a.Customer = b.Customer
                                                    and a.Account_id = b.Account_id
                                                    and a.Bank_id = b.Bank_id ;
quit ;

这篇关于根据不同的组,用上一个或下一个观察值替换缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆