改进 SAS SQL 查询的想法 [英] Ideas to improve SAS SQL query
问题描述
这是基于我回答的一个问题 这里 关于如何使用SAS解决以下问题.[注意:有一个有效的 SAS 数据步骤解决方案,但我只是想看看它如何在 SQL 中工作.]
问题(有点难以描述)是查看客户的每笔交易,从该交易日期开始回溯 90 天,计算该 90 天窗口内客户的交易总数,然后保存该数字原始交易,计算在该 90 天窗口内处理客户交易的不同客户经理的数量,并将该数字与原始交易一起保存.
这是初始化测试事务数据集和我的 SAS SQL 解决方案的数据步骤:
数据交易;长度 customerid $12 accountmanager $7 transactionid $12;输入 CustomerID AccountManager TransactionID Transaction_Time 日期时间.格式化交易时间日期时间.数据线;1111111111 FA001 TR2016001 08SEP16:11:19:251111111111 FA001 TR2016002 26OCT16:08:22:491111111111 FA002 TR2016003 04NOV16:08:05:361111111111 FA003 TR2016004 04NOV16:17:15:521111111111 FA004 TR2016005 25NOV16:13:04:161231231234 FA005 TR2016006 25AUG15:08:03:291231231234 FA005 TR2016007 16SEP15:08:24:241231231234 FA008 TR2016008 18SEP15:14:42:29;;;;跑步;进程sql;创建表想要作为选择 mgrs.*, trans.tranct从(选择 t3.customerid、t3.accountmanager、t3.transactionid、t3.transaction_time、count(*) 作为 mgrct从(选择不同的 t1.*, t2.accountmanager 作为 m2从交易 t1,交易 t2其中 t1.customerid=t2.customerid和datepart(t1.transaction_time) >= datepart(t2.transaction_time)-90和t2.transaction_time <= t1.transaction_time) t3按 t3.customerid, t3.accountmanager, t3.transactionid, t3.transaction_time 分组) mgrs,(选择 t4.customerid、t4.transactionid、count(*) 作为 tranct从交易 t4,交易 t5其中 t4.customerid=t5.customerid和datepart(t4.transaction_time) >= datepart(t5.transaction_time)-90和t5.transaction_time <= t4.transaction_timegroup by t4.customerid, t4.transactionid) trans其中 mgrs.customerid=trans.customerid 和 mgrs.transactionid=trans.transactionid;放弃;
结果如下:
<块引用>customerid accountmanager transactionid Transaction_Time mgrct tranct1111111111 FA001 TR2016001 08SEP16:11:19:25 1 11111111111 FA001 TR2016002 26OCT16:08:22:49 1 21111111111 FA002 TR2016003 04NOV16:08:05:36 2 31111111111 FA003 TR2016004 04NOV16:17:15:52 3 41111111111 FA004 TR2016005 25NOV16:13:04:16 4 51231231234 FA005 TR2016006 25AUG15:08:03:29 1 11231231234 FA005 TR2016007 16SEP15:08:24:24 1 21231231234 FA008 TR2016008 18SEP15:14:42:29 2 3
好久没用SQL了,想知道有没有更优雅的SQL方案.我以为会更简单,但实际上SAS数据步骤代码似乎比这个SQL查询更简单.
我认为您只需要加入表格即可.
proc sql noprint ;创建表想要作为选择一个.*, count(distinct b.accountmanager) as mgrct, count(*) 作为 tranct从交易a左连接事务 ba.customerid = b.customerid和 b.transaction_time <= a.transaction_time和日期部分(a.transaction_time)-日期部分(b.transaction_time)0 到 90 之间按 1,2,3,4 分组;放弃;
结果
1111111111 FA001 TR2016001 08SEP16:11:19:25 1 11111111111 FA001 TR2016002 26OCT16:08:22:49 1 21111111111 FA002 TR2016003 04NOV16:08:05:36 2 31111111111 FA003 TR2016004 04NOV16:17:15:52 3 41111111111 FA004 TR2016005 25NOV16:13:04:16 4 51231231234 FA005 TR2016006 25AUG15:08:03:29 1 11231231234 FA005 TR2016007 16SEP15:08:24:24 1 21231231234 FA008 TR2016008 18SEP15:14:42:29 2 3
This is based on a question I answered here about how to use SAS to solve the following problem. [NOTE: There is a working SAS data step solution, but I was just trying to see how it would work in SQL.]
The problem (a little hard to describe) is to look at each transaction for a customer, look back 90 days from that transaction's date, count the total number of the customer's transactions in that 90-day window, save that number with the original transaction, count the number of distinct account managers handling the customer's transactions in that 90-day window, and save that number with the original transaction.
Here is the data step to initialize a test transaction dataset and my SAS SQL solution:
data transaction;
length customerid $ 12 accountmanager $7 transactionid $ 12;
input CustomerID AccountManager TransactionID Transaction_Time datetime.;
format transaction_time datetime.;
datalines;
1111111111 FA001 TR2016001 08SEP16:11:19:25
1111111111 FA001 TR2016002 26OCT16:08:22:49
1111111111 FA002 TR2016003 04NOV16:08:05:36
1111111111 FA003 TR2016004 04NOV16:17:15:52
1111111111 FA004 TR2016005 25NOV16:13:04:16
1231231234 FA005 TR2016006 25AUG15:08:03:29
1231231234 FA005 TR2016007 16SEP15:08:24:24
1231231234 FA008 TR2016008 18SEP15:14:42:29
;;;;
run;
proc sql;
create table want as
select mgrs.*, trans.tranct
from
(select t3.customerid, t3.accountmanager, t3.transactionid, t3.transaction_time, count(*) as mgrct
from (select distinct t1.*, t2.accountmanager as m2
from transaction t1, transaction t2
where t1.customerid=t2.customerid
and
datepart(t1.transaction_time) >= datepart(t2.transaction_time)-90
and
t2.transaction_time <= t1.transaction_time) t3
group by t3.customerid, t3.accountmanager, t3.transactionid, t3.transaction_time) mgrs,
(select t4.customerid, t4.transactionid, count(*) as tranct
from transaction t4, transaction t5
where t4.customerid=t5.customerid
and
datepart(t4.transaction_time) >= datepart(t5.transaction_time)-90
and
t5.transaction_time <= t4.transaction_time
group by t4.customerid, t4.transactionid) trans
where mgrs.customerid=trans.customerid and mgrs.transactionid=trans.transactionid;
quit;
The result looks like this:
customerid accountmanager transactionid Transaction_Time mgrct tranct 1111111111 FA001 TR2016001 08SEP16:11:19:25 1 1 1111111111 FA001 TR2016002 26OCT16:08:22:49 1 2 1111111111 FA002 TR2016003 04NOV16:08:05:36 2 3 1111111111 FA003 TR2016004 04NOV16:17:15:52 3 4 1111111111 FA004 TR2016005 25NOV16:13:04:16 4 5 1231231234 FA005 TR2016006 25AUG15:08:03:29 1 1 1231231234 FA005 TR2016007 16SEP15:08:24:24 1 2 1231231234 FA008 TR2016008 18SEP15:14:42:29 2 3
I haven't used SQL for a long time, so I would like to know if there is a more elegant SQL solution. I thought it would be simpler, but actually the SAS data step code seems simpler than this SQL query.
I think you just need to join the table with itself.
proc sql noprint ;
create table want as
select a.*
, count(distinct b.accountmanager) as mgrct
, count(*) as tranct
from transaction a
left join transaction b
on a.customerid = b.customerid
and b.transaction_time <= a.transaction_time
and datepart(a.transaction_time)-datepart(b.transaction_time)
between 0 and 90
group by 1,2,3,4
;
quit;
Results
1111111111 FA001 TR2016001 08SEP16:11:19:25 1 1
1111111111 FA001 TR2016002 26OCT16:08:22:49 1 2
1111111111 FA002 TR2016003 04NOV16:08:05:36 2 3
1111111111 FA003 TR2016004 04NOV16:17:15:52 3 4
1111111111 FA004 TR2016005 25NOV16:13:04:16 4 5
1231231234 FA005 TR2016006 25AUG15:08:03:29 1 1
1231231234 FA005 TR2016007 16SEP15:08:24:24 1 2
1231231234 FA008 TR2016008 18SEP15:14:42:29 2 3
这篇关于改进 SAS SQL 查询的想法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!