提取时间重新加入 [英] Extracting time to re-join

查看：60 发布时间：2021/5/2 20:56:07 mysql sql r dplyr

本文介绍了提取时间重新加入的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的表格结构如下，其中包含有关俱乐部会员资格的信息

  id开始结束已取消1 2020-01-01 2020-12-31 2021-01-101 2021-02-01 2022-01-31不适用2 2020-01-01 2020-12-31不适用3 2020-01-01 2020-06-30 2020-07-013 2020-07-10 2021-01-09 2021-01-313 2021-02-02 2021-08-01不适用

我对之后重新加入的成员感兴趣.对于上述数据，在整个期间， id 1 都执行一次， id 2 则不执行，而 id 3 则执行两次./p>

对于每次发生这种情况，他们的成员资格终止与重新加入之间的时间是多少?对于以上数据，应为:

  id rejoin_date time_to_rejoin1 2021-01-31 22天3 2020-07-10 9天3 2021-02-02 2天

对于R:

  structure(list(id = c(1，1，2，3，3，3)，start = c("2020-01-01"，"2021-02-01"，"2020-01-01"，"2020-01-01"，"2020-07-10"，"2021-02-02")，end = c("2020-12-31"，"2022-01-31"，"2020-12-31"，"2020-06-30"，已取消的"2021-01-09"，"2021-08-01")= c("2021-01-10"，NA，NA，"2020-07-01"，"2021-01-31"，NA))，类="data.frame"，row.names = c(NA，-6L))

对于SQL:

 在不存在`dt`的情况下创建表(`id` int(6)unsigned NOT NULL，开始"时间戳，`结束`时间戳，取消TIMESTAMP空)DEFAULT CHARSET = utf8;将值插入`dt`(`id`，`start`，`end`，`cancelled`)值("1"，"2020-01-01"，"2020-12-31"，"2021-01-10")，('2'，'2021-02-01'，'2022-01-31'，NULL)，('2'，'2021-01-01'，'2020-12-31'，NULL)，("3"，"2020-01-01"，"2020-06-30"，"2020-07-01")，("3"，"2021-02-10"，"2021-01-09"，"2021-01-31")，('3'，'2021-02-02'，'2021-08-01'，NULL)

对于使用R(理想情况下使用dplyr)或SQL(MySQL)的答案，我会感到满意

解决方案

您可以使用 lag()然后进行过滤:

 选择t.*，datediff(开始，prev_cancelled)为num_days_since_cancel从(选择t.*，滞后(取消)超过(按开始按ID顺序对分区进行划分)为prev_cancelled从T)吨其中prev_cancelled不为null；

此处是db小提琴.

I have a table structured as follows, which contains information about club memberships

id  start       end          cancelled
1   2020-01-01  2020-12-31   2021-01-10
1   2021-02-01  2022-01-31   NA
2   2020-01-01  2020-12-31   NA
3   2020-01-01  2020-06-30   2020-07-01
3   2020-07-10  2021-01-09   2021-01-31
3   2021-02-02  2021-08-01   NA

I am interested in members that re-joined afterwards. For the above data, for the entire period, id 1 did this once, id 2 did not, and id 3 did so twice.

Ffor each time that this occurred, what was the time between their membership ending, and them rejoining ? For the above data, this would be:

id  rejoin_date time_to_rejoin
1   2021-01-31  22 days 
3   2020-07-10  9 days
3   2021-02-02  2 days

For R:

structure(list(id = c(1, 1, 2, 3, 3, 3), start = c("2020-01-01", 
"2021-02-01", "2020-01-01", "2020-01-01", "2020-07-10", "2021-02-02"
), end = c("2020-12-31", "2022-01-31", "2020-12-31", "2020-06-30", 
"2021-01-09", "2021-08-01"), cancelled = c("2021-01-10", NA, 
NA, "2020-07-01", "2021-01-31", NA)), class = "data.frame", row.names = c(NA, 
-6L))

For SQL:

CREATE TABLE IF NOT EXISTS `dt` (
  `id` int(6) unsigned NOT NULL,
  `start` TIMESTAMP,
  `end` TIMESTAMP,
  `cancelled` TIMESTAMP NULL
) DEFAULT CHARSET=utf8;
INSERT INTO `dt` (`id`, `start`, `end`, `cancelled`) VALUES
  ('1', '2020-01-01', '2020-12-31', '2021-01-10'),
  ('2', '2021-02-01', '2022-01-31', NULL ),
  ('2', '2021-01-01', '2020-12-31', NULL ),
  ('3', '2020-01-01', '2020-06-30', '2020-07-01'),
  ('3', '2021-02-10', '2021-01-09', '2021-01-31'),
  ('3', '2021-02-02', '2021-08-01', NULL )

I would be happy with answers that use either R (ideally using dplyr) or SQL (MySQL)

解决方案

You can use lag() and then filter:

select t.*,
       datediff(start, prev_cancelled) as num_days_since_cancel
from (select t.*,
             lag(cancelled) over (partition by id order by start) as prev_cancelled
      from t
     ) t
where prev_cancelled is not null;

Here is a db<>fiddle.

这篇关于提取时间重新加入的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

提取时间重新加入 [英] Extracting time to re-join

问题描述

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

提取时间重新加入 [英] Extracting time to re-join

问题描述

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭