向量化“日期范围的扩展” R的dplyr中的每行 [英] vectorizing "expansion of date range" per row in dplyr of R

查看:85
本文介绍了向量化“日期范围的扩展” R的dplyr中的每行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中有一个像下面这样的数据集:

I have a dataset in tibble in R like the one below:

# A tibble: 50,045 x 5
   ref_key start_date end_date  
   <chr>   <date>     <date>    
 1 123     2010-01-08 2010-01-13
 2 123     2010-01-21 2010-01-23
 3 123     2010-03-10 2010-04-14

我需要创建另一个小标题,每行仅存储一个日期,例如以下日期:

I need to create another tibble that each row only store one date, like the one below:

   ref_key date      
   <chr>   <date>    
 1 123     2010-01-08
 2 123     2010-01-09
 3 123     2010-01-10
 4 123     2010-01-11
 5 123     2010-01-12
 6 123     2010-01-13
 7 123     2010-01-21
 8 123     2010-01-22
 9 123     2010-01-23

当前,我正在为以下内容编写一个显式循环:

Currently I am writing an explicit loop for that like below:

for (loop in (1:nrow(input.df))) {
  if (loop%%100==0) {
    print(paste(loop,'/',nrow(input.df)))
  }
  temp.df.st00 <- input.df[loop,] %>% data.frame
  temp.df.st01 <- tibble(ref_key=temp.df.st00[,'ref_key'],
                    date=seq(temp.df.st00[,'start_date'],
                             temp.df.st00[,'end_date'],1))
  if (loop==1) {
    output.df <- temp.df.st01
  } else {
    output.df <- output.df %>%
      bind_rows(temp.df.st01)
  }
}

它正在工作,但是速度很慢,因为我有> 50k行

It is working, but in a slow way, given that I have >50k rows to process, it takes a few minutes to finish the loop.

我想知道是否可以对这一步骤进行矢量化处理,因为它与 row_wise有关吗? dplyr

I wonder if this step can be vectorized, is it something related to row_wise in dplyr?

推荐答案

行名称列( rownames_to_column ),然后 nest 'rn'和'ref_key',映射和 unnest 变异 >在选择删除不需要的列之后

We create a row name column (rownames_to_column), then nest the 'rn' and 'ref_key', mutate by taking the sequence of 'start_date' and 'end_date' within map and unnest after selecting out the unwanted columns

library(tidyverse)
res <- df1 %>%
         rownames_to_column('rn') %>% 
         nest(-rn, -ref_key) %>%
         mutate(date = map(data, ~ seq(.x$start_date, .x$end_date, by = "1 day"))) %>%
         select(-data, -rn) %>%
         unnest
head(res, 9)
#  ref_key       date
#1     123 2010-01-08
#2     123 2010-01-09
#3     123 2010-01-10
#4     123 2010-01-11
#5     123 2010-01-12
#6     123 2010-01-13
#7     123 2010-01-21
#8     123 2010-01-22
#9     123 2010-01-23

这篇关于向量化“日期范围的扩展” R的dplyr中的每行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆