将日期范围拆分为以YYYY-12-31结尾的多个数据块 [英] Split date range into several chunks ending by YYYY-12-31

查看:76
本文介绍了将日期范围拆分为以YYYY-12-31结尾的多个数据块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

df <- data.frame(group = c("a", "a", "b", "b"),
                 start = c("2017-05-01", "2019-04-03", "2011-03-03", "2014-05-07"),
                 end = c("2018-09-01", "2020-04-03", "2012-05-03", "2016-04-02"))  

让我们说我有以下df:

Let's say I have the following df:

  group      start        end
1     a 2017-05-01 2018-09-01
2     a 2019-04-03 2020-04-03
3     b 2011-03-03 2012-05-03
4     b 2014-05-07 2016-04-02

我想将其转换成这种格式,每条记录分为开始日期和31/12该年份及其后年份:

I want to get it into this format, with each record split into starting date and 31/12 of that and subsequent years:

  group      start        end
1     a 2017-05-01 2017-12-31
2     a 2018-01-01 2018-09-01
3     a 2019-04-03 2019-12-31
4     a 2020-01-01 2020-04-03
5     b 2011-03-03 2011-12-31
6     b 2012-01-01 2012-05-03
7     b 2014-05-07 2014-12-31
8     b 2015-01-01 2015-12-31
9     b 2016-01-01 2016-04-02

关于如何解决此问题的任何想法?

Any ideas on how to tackle this?

编辑:

我最关心的不是日期范围内的日期同年。但是,正如chinsoon12所指出的那样,如果该方法也能够处理它们,确实会有所帮助,例如在以下数据集中:

My primary concerns are not the date ranges that are within the same year. However, as chinsoon12 pointed out, it would indeed be helpful if the approach could handle them as well, as for instance in this dataset:

df <- data.frame(group = c("a", "a", "b", "b", "c"),
                 start = c("2017-05-01", "2019-04-03", "2011-03-03", "2014-05-07", "2017-02-01"),
                 end = c("2018-09-01", "2020-04-03", "2012-05-03", "2016-04-02", "2017-04-05")) 

最终结果将保留最后一行:

The end result would leave the last line as it was:

   group      start        end
1      a 2017-05-01 2017-12-31
2      a 2018-01-01 2018-09-01
3      a 2019-04-03 2019-12-31
4      a 2020-01-01 2020-04-03
5      b 2011-03-03 2011-12-31
6      b 2012-01-01 2012-05-03
7      b 2014-05-07 2014-12-31
8      b 2015-01-01 2015-12-31
9      b 2016-01-01 2016-04-02
10     c 2017-02-01 2017-04-05  


推荐答案

p带有 data.table 的问题的可行解决方案:

A possible solution with data.table:

library(data.table)
setDT(df)

df[df[, rep(.I, 1 + year(end) - year(start))]
   ][, `:=` (start = pmax(start[1], as.Date(paste0(year(start[1]) + 0:(.N-1), '-01-01'))),
             end = pmin(end[.N], as.Date(paste0(year(end[.N]) - (.N-1):0, '-12-31'))))
     , by = .(group, rleid(start))][]

它给出:


    group      start        end
 1:     a 2017-05-01 2017-12-31
 2:     a 2018-01-01 2018-09-01
 3:     a 2019-04-03 2019-12-31
 4:     a 2020-01-01 2020-04-03
 5:     b 2011-03-03 2011-12-31
 6:     b 2012-01-01 2012-05-03
 7:     b 2014-05-07 2014-12-31
 8:     b 2015-01-01 2015-12-31
 9:     b 2016-01-01 2016-04-02
10:     c 2017-02-01 2017-04-05


带有

# alternative 1:
df[, ri := rowid(group)
   ][df[, rep(.I, 1 + year(end) - year(start))]
     ][, `:=` (start = if (.N == 1) start else c(start[1], as.Date(paste0(year(start[1]) + 1:(.N-1), '-01-01') )),
               end = if (.N == 1) end else c(as.Date(paste0(year(end[.N]) - (.N-1):1, '-12-31') ), end[.N]))
       , by = .(group, ri)][, ri := NULL][]

# alternative 2:
df[, ri := rowid(group)
   ][df[, rep(.I, 1 + year(end) - year(start))]
     ][, `:=` (start = pmax(start[1], as.Date(paste0(year(start[1]) + 0:(.N-1), '-01-01'))),
               end = pmin(end[.N], as.Date(paste0(year(end[.N]) - (.N-1):0, '-12-31'))))
       , by = .(group, ri)][, ri := NULL][]






使用的数据:


Used data:

df <- data.frame(group = c("a", "a", "b", "b", "c"),
                 start = c("2017-05-01", "2019-04-03", "2011-03-03", "2014-05-07", "2017-02-01"),
                 end = c("2018-09-01", "2020-04-03", "2012-05-03", "2016-04-02", "2017-04-05")) 
df[2:3] <- lapply(df[2:3], as.Date)

这篇关于将日期范围拆分为以YYYY-12-31结尾的多个数据块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆