如何在 SAS 上计算过去 90 天内的不同 ID? [英] How to count distinct ID in the last 90 days on SAS?

查看:54
本文介绍了如何在 SAS 上计算过去 90 天内的不同 ID?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的数据集:

I have a dataset like this:

CustomerID  AccountManager TransactionID  Transaction_Time 
1111111111  FA001          TR2016001      08SEP16:11:19:25
1111111111  FA001          TR2016002      26OCT16:08:22:49
1111111111  FA002          TR2016003      04NOV16:08:05:36
1111111111  FA003          TR2016004      04NOV16:17:15:52
1111111111  FA004          TR2016005      25NOV16:13:04:16
1231231234  FA005          TR2016006      25AUG15:08:03:29
1231231234  FA005          TR2016007      16SEP15:08:24:24
1231231234  FA008          TR2016008      18SEP15:14:42:29

CustomerID 代表每个客户,每个客户可以有多个交易.每个客户经理也可以处理多笔交易.但是transactionID在这个表中是唯一的.

CustomerID represents each customer, each customer could have multiple transactions. Each account manager could deal with multiple transactions too. But transactionID is unique in this table.

现在我想为每个客户计算交易发生的时间,如果我回到过去 90 天,有多少不同的客户经理参与以及发生了多少交易.我要找的结果是这样的:

Now I would like to count for each customer, when the transation happened, if I went back to last 90 days, how many distinct Account Manager get involved and how many transactions happened. The result I am looking for is like this:

CustomerID  Manager TransacID  Transaction_Time    CountTransac CountManager
1111111111  FA001   TR2016001  08SEP16:11:19:25    1            1
1111111111  FA001   TR2016002  26OCT16:08:22:49    2            1
1111111111  FA002   TR2016003  04NOV16:08:05:36    3            2
1111111111  FA003   TR2016004  04NOV16:17:15:52    4            3
1111111111  FA004   TR2016005  25NOV16:13:04:16    5            4
1231231234  FA005   TR2016006  25AUG15:08:03:29    1            1
1231231234  FA005   TR2016007  16SEP15:08:24:24    2            1
1231231234  FA008   TR2016008  18SEP15:14:42:29    3            2

现在使用以下代码,我弄清楚了如何计算交易计数,但我不知道如何计算不同的经理计数.如果有人可以帮助我,将不胜感激.非常感谢.

Now using the following code, I figure out how to calculate transaction count, but I did not know how to calculate the distinct manager count. It would be highly appreciated if someone could help me out. Thanks a lot.

DATA want;
    SET transaction;
    COUNT=1;
    DO point=_n_-1 TO 1 BY -1;
        SET want(KEEP=CustomerID Transaction_Time COUNT POINT=point
            RENAME=(CustomerID =SAME_ID Transaction_Time =OTHER_TIME COUNT=OTHER_COUNT));

        IF CustomerID NE SAME_ID 
            OR INTCK ("DAY", DATEPART(OTHER_TIME), DATEPART(Transaction_Time )) > 90 
            THEN LEAVE;   
        COUNT + OTHER_COUNT;
    END;
    DROP SAME_ID OTHER_TIME OTHER_COUNT;
    RENAME COUNT=COUNT_TRANSAC;
RUN;

推荐答案

您的代码根本无法正常工作,但我知道您想要做什么.这是有效的方法.我注释掉了 WHERE 语句,以便您可以看到它产生了您要求的结果.如果您真的只需要过去 90 天,则需要 WHERE 语句.

Your code does not work at all as it is, but I see what you want to do. Here is something that does work. I commented out the WHERE statement so you can see that it produces the result you asked for. You need the WHERE statement if you really want just the last 90 days.

* Always a good idea to sort first unless you are CERTAIN that
* your values are in the order you want.;
proc sort data=have;
    by customerid AccountManager transactionid;
run;

DATA want;
    SET have;
* Uncomment the WHERE statement to activate the 90-day time frame.;
*   where today()-datepart(transaction_time)<=90;
    by customerid AccountManager transactionid;
    if first.customerid
     then do;
        counttransac=0;
        countmanager=0;
     end;
    if first.AccountManager
     then countmanager+1;

    counttransac+1;

RUN;

利用 SAS 的 BY 语句和 first.last. 变量修饰符,您可以在每次看到新客户 ID 和经理 ID.

Taking advantage of SAS's BY statement and the first. and last. variable modifiers, you can reset your counter each time you see a new customer ID and manager ID.

好的,这要困难得多.这是在每次交易之前回顾历史的代码.我明白您为什么使用两个 SET 语句,因为您必须将数据集连接到自身.也许你可以用 PROC SQL 做到这一点,但我没有时间检查它.如果这对您有用,请告诉我.

Okay, that's much more difficult. Here is code that looks back at the history before each transaction. I see why you were using two SET statements because you have to join the dataset to itself. Probably you can do this with PROC SQL, but I didn't have time to check it out. Let me know if this works for you.

* Sort each customer's and manager's transactions;
proc sort data=transaction;
    by customerid accountmanager;
run;


DATA want;
    SET transaction nobs=pmax;
    by customerid;

    length lastmgr $ 100;
    retain pstart;      * Starting row for each customer;

    * Save starting row for each customer;
    if first.customerid
     then pstart=_n_;

    * Initialize current account manager and counters for
    * managers and transactions. The current transaction always
    * counts as one transaction and one manager.
    * Save the beginning of the 90-day period to avoid 
    * recalculating it each time.;
    lastmgr=accountmanager;
    mgrct=1;
    tranct=1;
    ninetyday=datepart(transaction_time)-90;

    * Set the starting row to search for each transaction;
    p=pstart;

    * Loop through all rows for the customer and only count
    * those that occur before the current transaction and
    * after the 90-day period before it.;
    * Note that the transactions are not necessarily sorted
    * in chronological order but rather in groups by customer
    * and manager, so we have to look through all of the
    * customer's transactions each time.;
    * DO UNTIL(0) means loop forever, so be careful that
    * there is always a LEAVE statement executed.;
    do until(0);

        * p > pmax means the end of the transaction list, so stop.;
        if p > pmax
         then leave;

        set transaction (keep=customerid accountmanager transaction_time
                  rename=(customerid=cust2 accountmanager=mgr2 transaction_time=tt2))
            point=p;

        * When customer ID changes, we are done with the loop.;
        if cust2 ~= customerid
         then leave;
         else do;
            * To be counted, the transaction needs to be within the 
            * 90-day period. Using "<" for the transaction time pre-
            * vents counting the current transaction twice.;
            if datepart(tt2) >= ninetyday and tt2 < transaction_time
             then do;
                tranct=tranct+1;
                if mgr2 ~= lastmgr
                 then do;
                    mgrct=mgrct+1;
                    lastmgr=mgr2;
                 end;
             end;
          end;

        * Look at the next transaction.;
        p=p+1;

    end;

    keep CustomerID AccountManager TransactionID Transaction_Time tranct mgrct;

RUN;

这是一种有效的 PROC SQL 方法.这是汤姆在这里回答我的问题关于如何创建一个优雅的查询来完成您的任务:

Here is a PROC SQL approach that works. It's by Tom in answer to my question here about how to create an elegant query to accomplish your task:

proc sql noprint ;
 create table want as
   select a.*
        , count(distinct b.accountmanager) as mgrct
        , count(*) as tranct
   from transaction a
   left join transaction b
   on a.customerid = b.customerid
    and b.transaction_time <= a.transaction_time
    and datepart(a.transaction_time)-datepart(b.transaction_time)
        between 0 and 90
   group by 1,2,3,4
 ;
quit;

这篇关于如何在 SAS 上计算过去 90 天内的不同 ID?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆