从 Teradata 到 PROC SQL 的 SQL 语句语法 [英] SQL statment syntax from Teradata to PROC SQL

查看:71
本文介绍了从 Teradata 到 PROC SQL 的 SQL 语句语法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 SAS 新手,我正在尝试使用 proc sql 构建一些查询.

I am new to SAS and I am trying to build some query using proc sql.

我在 sql 中有以下代码(teradata 语法):

I have the following code in sql (teradata syntax):

create multiset volatile table mvt_data, no log
( date1     date format 'yyyy-mm-dd'
, flag1     byteint
,cust_id    varchar(25)
)
primary index (date1)
on commit preserve rows;

insert into mvt_data values (date '2020-01-03', 1, 'A');
insert into mvt_data values (date '2020-02-04', 0, 'A');
insert into mvt_data values (date '2020-04-05', 0, 'B');
insert into mvt_data values (date '2020-01-19', 1, 'C');
insert into mvt_data values (date '2020-03-20', 1, 'B');
insert into mvt_data values (date '2020-06-20', 1, 'D');

我想要以下输出:

对于 date1 的每个 Cust_id 和 YYYYMM 求和过去 6 个月累积的 Flag1 和总 Flag1.

For each Cust_id and YYYYMM of date1 sum the Flag1 and sum Flag1 cumulative with respect to last 6 months.

我会用同样的语法:

with cte_data_ts (ts1, Flag1) as
(
select cast(date1 as timestamp(0)), Flag1
  from mvt_data
)
  ,  cte_gbt (YearMonth, SumFlag1, cust_id) as
(
  select cust_id                                          as cust_id
       ,to_char(begin($TD_TIMECODE_RANGE) at 0, 'yyyymm') as YearMonth
       , sum(flag1)                                       as SumFlag1
    from cte_data_ts
group by time(cal_months(1))
   using timecode(ts1)
    fill (0)
)
  select cust_id, YearMonth, SumFlag1
       , sum(SumFlag1) over(order by cust_id, YearMonth asc rows between 6 preceding and current row) as SumFlag1_last6Months
    from cte_gbt;

不幸的是,我知道 proc sql 中不存在 over(),所以有人能帮我实现相同的结果吗?

Unfortunatelly I know that over() does not exists in proc sql, so can anyone help me achieve the same result ?

我添加了一个新列 (Cust_ID)

I added a new column (Cust_ID)

推荐答案

所以首先让我们将 SQL 数据加载转换为正常的 SAS 数据步骤,以便处理一些数据.

So first let's convert the SQL data load into normal SAS data step to have some data to work with.

data mvt_data;
  input date1 :yymmdd. flag1 cust_id $;
  format date1 yymmdd10.;
cards;
2020-01-03 1 A
2020-02-04 0 A
2020-04-05 0 B
2020-01-19 1 C
2020-03-20 1 B
2020-06-20 1 D
;

这是一种生成未出现在源中的月份的观察值的方法.

Here is a way to generate observations for months that don't appear in the source.

data YearMonth;
  min='01JAN2020'd ;
  max='01JUN2020'd;
  do offset=0 to intck('month',min,max);
    do YearMonth=intnx('month',min,offset,'b') ;
      output;
    end;
  end;
  keep YearMonth ;
  format YearMonth yymmn6.;
run;

或者计算 DATE1 的最小值/最大值并使用它们来设置日期范围.

Or calculate the min/max values of DATE1 and use those to set the range of dates.

如果您没有其他来源的 Cust_id 值列表,您可以查询数据

If you don't have another source for the list of Cust_id values you can query the data

select distinct cust_id from mvt_data

所以加入两者以获得完整的 YearMonth*Cust_id 组合.

So join the two to get the full YearMonth*Cust_id combinations.

select x.cust_id,y.YearMonth 
   from (select distinct cust_id from mvt_data) x
      , YearMonth y

现在将其与实际数据结合以获得每月设置 FLAG1 的计数.您可以利用 SAS 将布尔表达式计算为 0/1 值这一事实来简化代码.(在通用 SQL 实现中,您需要使用 CASE)因此,当标记观察的日期在间隔内时,它对 SUM() 贡献 1.

Now join that with the actual data to get the count number where FLAG1 is set per month. You can use the fact that SAS evaluates boolean expressions to 0/1 values to make the code simpler. (In generic SQL implementation you would need to use CASE) So when the date for the flagged observation is in the interval then it contributes 1 to the SUM().

proc sql ;
 create table want as
 select a.cust_id
      , a.YearMonth
      , sum(b.date1 between intnx('month',a.YearMonth,0,'b')
                        and intnx('month',a.YearMonth,0,'e') ) as SumFlag1
      , sum(b.date1 between intnx('month',a.YearMonth,-6,'b')
                        and intnx('month',a.YearMonth,0,'e') ) as SumFlag1_last6Months 
      , sum(b.date1 between intnx('month',a.YearMonth,-12,'b')
                        and intnx('month',a.YearMonth,0,'e') ) as SumFlag1_last12Months 
 from 
  (select x.cust_id,y.YearMonth 
   from (select distinct cust_id from mvt_data) x
      , YearMonth y
  ) a
 left join mvt_data b
   on a.cust_id = b.cust_id and b.Flag1=1
 group by 1,2
 ;
quit;

结果:

                    Year     Sum      SumFlag1_       SumFlag1_
Obs    cust_id     Month    Flag1    last6Months    last12Months

  1       A       202001      1           1               1
  2       A       202002      0           1               1
  3       A       202003      0           1               1
  4       A       202004      0           1               1
  5       A       202005      0           1               1
  6       A       202006      0           1               1
  7       B       202001      0           0               0
  8       B       202002      0           0               0
  9       B       202003      1           1               1
 10       B       202004      0           1               1
 11       B       202005      0           1               1
 12       B       202006      0           1               1
 ...

这篇关于从 Teradata 到 PROC SQL 的 SQL 语句语法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆