删除在移动时间窗口内识别的行 [英] Drop rows identified within moving time window

查看:52
本文介绍了删除在移动时间窗口内识别的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个住院数据(法术")——每个法术 1 行.我想在一个星期内一个接一个地删除任何记录的法术(可能有多个)——理由是它们可能是同一根本原因的症状.这是一些播放数据:

I have a dataset of hospitalisations ('spells') - 1 row per spell. I want to drop any spells recorded within a week after another (there could be multiple) - the rationale being is that they're likely symptomatic of the same underlying cause. Here is some play data:

create table hif_user.rzb_recurse_src (
patid integer not null,
eventdate integer not null,
type smallint not null
);

insert into hif_user.rzb_recurse_src values (1,1,1);
insert into hif_user.rzb_recurse_src values (1,3,2);
insert into hif_user.rzb_recurse_src values (1,5,2);
insert into hif_user.rzb_recurse_src values (1,9,2);
insert into hif_user.rzb_recurse_src values (1,14,2);
insert into hif_user.rzb_recurse_src values (2,1,1);
insert into hif_user.rzb_recurse_src values (2,5,1);
insert into hif_user.rzb_recurse_src values (2,19,2);

只有类型 2 的法术 - 在任何其他法术之后的一周内 - 会被丢弃.类型 1 法术将保留.

Only spells of type 2 - within a week after any other - are to be dropped. Type 1 spells are to remain.

对于患者 1,日期 1 &9 应该保留.对于患者 2,应保留所有行.

For patient 1, dates 1 & 9 should be kept. For patient 2, all rows should remain.

问题出在患者 1 上.由于法术日期 9 接近于法术日期 5,因此识别出法术日期 9;然而,由于法术日期 5 接近于法术日期 1,因此应该放弃法术日期 9,因此允许法术日期 9 继续存在......

The issue is with patient 1. Spell date 9 is identified for dropping as it is close to spell date 5; however, as spell date 5 is close to spell date 1 is should be dropped therefore allowing spell date 9 to live...

所以,这似乎是一个递归问题.但是,我之前没有在 SQL 中使用过递归编程,而且我正在努力想象如何做到这一点.任何人都可以帮忙吗?我应该补充一点,我使用的 Teradata 比大多数递归 SQL 的限制更多(我相信只允许 UNION ALL 集).

So, it seems a recursive problem. However, I've not used recursive programming in SQL before and I'm struggling to really picture how to do it. Can anyone help? I should add that I'm using Teradata which has more restrictions than most with recursive SQL (only UNION ALL sets allowed I believe).

推荐答案

这是一个游标逻辑,一行一行检查是否符合您的规则,因此递归是解决您问题的最简单(也许是唯一)方法.

It's a cursor logic, check one row after the other if it fits your rules, so recursion is the easiest (maybe the only) way to solve your problem.

为了获得不错的性能,您需要一个易失性表来促进这种逐行处理:

To get a decent performance you need a Volatile Table to facilitate this row-by-row processing:

CREATE VOLATILE TABLE vt (patid, eventdate, exac_type, rn, startdate) AS
(
SELECT r.*
   ,ROW_NUMBER() -- needed to facilitate the join
    OVER (PARTITION BY patid ORDER BY eventdate) AS rn
FROM hif_user.rzb_recurse_src AS r
) WITH DATA ON COMMIT PRESERVE ROWS;

WITH RECURSIVE cte (patid, eventdate, exac_type, rn, startdate) AS
 (
   SELECT vt.*
     ,eventdate AS startdate 
   FROM vt
   WHERE rn = 1 -- start with the first row

   UNION ALL

   SELECT vt.*
     -- check if type = 1 or more than 7 days from the last eventdate
     ,CASE WHEN vt.eventdate > cte.startdate + 7  
             OR vt.exac_type = 1
           THEN vt.eventdate   -- new start date
           ELSE cte.startdate  -- keep old date
      END
   FROM vt JOIN cte
     ON vt.patid = cte.patid
    AND vt.rn = cte.rn + 1 -- proceed to next row
 )    
SELECT * 
FROM cte
WHERE eventdate - startdate = 0 -- only new start days
order by patid, eventdate

这篇关于删除在移动时间窗口内识别的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆