提取时间重新加入 [英] Extracting time to re-join

查看:60
本文介绍了提取时间重新加入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的表格结构如下,其中包含有关俱乐部会员资格的信息

  id开始结束已取消1 2020-01-01 2020-12-31 2021-01-101 2021-02-01 2022-01-31不适用2 2020-01-01 2020-12-31不适用3 2020-01-01 2020-06-30 2020-07-013 2020-07-10 2021-01-09 2021-01-313 2021-02-02 2021-08-01不适用 

我对之后重新加入的成员感兴趣.对于上述数据,在整个期间, id 1 都执行一次, id 2 则不执行,而 id 3 则执行两次./p>

对于每次发生这种情况,他们的成员资格终止与重新加入之间的时间是多少?对于以上数据,应为:

  id rejoin_date time_to_rejoin1 2021-01-31 22天3 2020-07-10 9天3 2021-02-02 2天 

对于R:

  structure(list(id = c(1,1,2,3,3,3),start = c("2020-01-01","2021-02-01","2020-01-01","2020-01-01","2020-07-10","2021-02-02"),end = c("2020-12-31","2022-01-31","2020-12-31","2020-06-30",已取消的"2021-01-09","2021-08-01")= c("2021-01-10",NA,NA,"2020-07-01","2021-01-31",NA)),类="data.frame",row.names = c(NA,-6L)) 

对于SQL:

 在不存在`dt`的情况下创建表(`id` int(6)unsigned NOT NULL,开始"时间戳,`结束`时间戳,取消TIMESTAMP空)DEFAULT CHARSET = utf8;将值插入`dt`(`id`,`start`,`end`,`cancelled`)值("1","2020-01-01","2020-12-31","2021-01-10"),('2','2021-02-01','2022-01-31',NULL),('2','2021-01-01','2020-12-31',NULL),("3","2020-01-01","2020-06-30","2020-07-01"),("3","2021-02-10","2021-01-09","2021-01-31"),('3','2021-02-02','2021-08-01',NULL) 

对于使用R(理想情况下使用dplyr)或SQL(MySQL)的答案,我会感到满意

解决方案

您可以使用 lag()然后进行过滤:

 选择t.*,datediff(开始,prev_cancelled)为num_days_since_cancel从(选择t.*,滞后(取消)超过(按开始按ID顺序对分区进行划分)为prev_cancelled从T)吨其中prev_cancelled不为null; 

此处是db小提琴.

>

I have a table structured as follows, which contains information about club memberships

id  start       end          cancelled
1   2020-01-01  2020-12-31   2021-01-10
1   2021-02-01  2022-01-31   NA
2   2020-01-01  2020-12-31   NA
3   2020-01-01  2020-06-30   2020-07-01
3   2020-07-10  2021-01-09   2021-01-31
3   2021-02-02  2021-08-01   NA

I am interested in members that re-joined afterwards. For the above data, for the entire period, id 1 did this once, id 2 did not, and id 3 did so twice.

Ffor each time that this occurred, what was the time between their membership ending, and them rejoining ? For the above data, this would be:

id  rejoin_date time_to_rejoin
1   2021-01-31  22 days 
3   2020-07-10  9 days
3   2021-02-02  2 days

For R:

structure(list(id = c(1, 1, 2, 3, 3, 3), start = c("2020-01-01", 
"2021-02-01", "2020-01-01", "2020-01-01", "2020-07-10", "2021-02-02"
), end = c("2020-12-31", "2022-01-31", "2020-12-31", "2020-06-30", 
"2021-01-09", "2021-08-01"), cancelled = c("2021-01-10", NA, 
NA, "2020-07-01", "2021-01-31", NA)), class = "data.frame", row.names = c(NA, 
-6L))

For SQL:

CREATE TABLE IF NOT EXISTS `dt` (
  `id` int(6) unsigned NOT NULL,
  `start` TIMESTAMP,
  `end` TIMESTAMP,
  `cancelled` TIMESTAMP NULL
) DEFAULT CHARSET=utf8;
INSERT INTO `dt` (`id`, `start`, `end`, `cancelled`) VALUES
  ('1', '2020-01-01', '2020-12-31', '2021-01-10'),
  ('2', '2021-02-01', '2022-01-31', NULL ),
  ('2', '2021-01-01', '2020-12-31', NULL ),
  ('3', '2020-01-01', '2020-06-30', '2020-07-01'),
  ('3', '2021-02-10', '2021-01-09', '2021-01-31'),
  ('3', '2021-02-02', '2021-08-01', NULL )

I would be happy with answers that use either R (ideally using dplyr) or SQL (MySQL)

解决方案

You can use lag() and then filter:

select t.*,
       datediff(start, prev_cancelled) as num_days_since_cancel
from (select t.*,
             lag(cancelled) over (partition by id order by start) as prev_cancelled
      from t
     ) t
where prev_cancelled is not null;

Here is a db<>fiddle.

这篇关于提取时间重新加入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆