根据日期合并长格式的两个数据帧 [英] Merging two data frames in long format based on date

查看:82
本文介绍了根据日期合并长格式的两个数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个数据框,一个( df1 )记录了每天发生的不同活动,另一个( df2 )记录当天发生的活动的属性。

I have a 2 data frames, one (df1) that records the daily occurrence of different activities and another (df2) that records properties of the occurred activity during the day.

df1 可以识别重复活动以及持续时间。一天的开始时间由 Date 变量指定。

From df1 it is possible to identify the repeated occurrence of an activity as well the duration. When the day starts is specified by the Date variable.

例如:


  • id 12发生从第1天开始,到d7结束。在这种情况下,发生次数为7,持续时间为11。

  • 对于 id 123,该周从第5天开始,直到d7;由于在第6天有间隔天,持续时间为6,id 123(从第6天开始,到第7天结束)连续发生了两次,持续时间为6,所以重复出现。

  • id 12 the occurrence starts at day1 and ends at d7. In this case the occurrence is 7 and duration is 11.
  • for id 123 the week starts at day 5 and ends at d7; occurred in repeated order because of there are gap days at day 6 and duration is 6 and id 123 (starts at day6 ends at day 7) occurred 2 times consecutively and duration 6.

df1 中,变量Date定义记录开始的日期。例如,id 12记录从第1天开始,依此类推。

In df1 the variable Date defines the day when the record started. For example id 12 record started at day1 and so on.

我想确定在连续出现期间是否在<$ c $中有关于活动属性的记录c> df2 。

I would like to identify if during the consecutive occurrence if there are records on the activity properties in df2.

例如id 12,出现7次,持续时间为12,则有星期三的记录( df1 中的day3)并且该记录对应于连续发生的3天。对于id 123,没有数据(例如,没有连续出现),但对于id 6,它连续6天出现,持续时间18,则在第6天有记录。

For example id 12, occurred 7 times and duration is 12 there is record for Wednesday (day3 in df1) and this record corresponds to the 3 day of the consecutive occurrence. For id 123 there is no data (eg. no consecutive occurrence) but for id 10 for 6 day occurrence and duration 18 there is a record on the 6th day.

Df1 :

id   day1 day2 day3 day4 day5 day6  day7   Date
 12    2    1    2    1    1    3    1     Mon
123    0    3    0    3    3    0    3     Fri
 10    0    3    3    3    3    3    3     Sat

Df2:

    id   c1 c2  Date
    12   3   3   Wednesday
   123   3   2   Fri
     10  3   1   Sat

结果:

 id c1 c2  Occurrence Position
 12 3   3     7          3
123 0   0     0          0
 10 3   1     2          1

样本数据:df1

structure(list(id = c(12L, 123L, 10L), day1 = c(2L, 0L, 3L), 
    day2 = c(1L, 3L, 3L), day3 = c(2L, 0L, 3L), day4 = c(1L, 
    3L, 3L), day5 = c(1L, 3L, 3L), day6 = c(3L, 0L, 3L), day7 = c(1L, 
    3L, 3L), Date = c("Monday", "Friday", "Saturday")), row.names = c(NA, 
-3L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x000002a81a571ef0>)

df2:

structure(list(id = c(12, 123, 10), c1 = c(3, 3, 3), c2 = c(3, 
2, 1), Date = structure(c(3L, 1L, 2L), .Label = c("Friday", "Saturday", 
"Wednesday"), class = "factor")), row.names = c(NA, -3L), class = "data.frame")


推荐答案

使用 dplyr (也许不是最短的):

A solution with dplyr (maybe not the shortest one):

# library
library(tidyverse)

# get data
df1 <- structure(list(id = c(12L, 123L, 10L), 
               day1 = c(2L, 0L, 3L), 
               day2 = c(1L, 3L, 3L), 
               day3 = c(2L, 0L, 3L), 
               day4 = c(1L,3L, 3L), 
               day5 = c(1L, 3L, 3L), 
               day6 = c(3L, 0L, 3L), 
               day7 = c(1L,3L, 3L), 
               Date = c("Monday", "Friday", "Saturday")),
               row.names = c(NA,-3L), class = c("data.table", "data.frame"))


df2 <- structure(list(id = c(12, 123, 10),
                      c1 = c(3, 3, 3), 
                      c2 = c(3, 2, 1),
                      Date = structure(c(3L, 1L, 2L), .Label = c("Friday", "Saturday","Wednesday"),
                                       class = "factor")), row.names = c(NA, -3L), class = "data.frame")


# change days to nummeric (will help you later)
df1 %>% mutate(
  Date_nr_df1=case_when(
    Date=="Monday" ~ 1,
    Date=="Tuesday" ~2,
    Date=="Wednesday" ~3,
    Date=="Thursday" ~4,
    Date=="Friday" ~5,
    Date=="Saturday" ~6,
    Date=="Sunday" ~7)) -> df1

df2 %>% mutate(
  Date_nr_df2=case_when(
    Date=="Monday" ~ 1,
    Date=="Tuesday" ~2,
    Date=="Wednesday" ~3,
    Date=="Thursday" ~4,
    Date=="Friday" ~5,
    Date=="Saturday" ~6,
    Date=="Sunday" ~7)) -> df2

# combine data by the id column
left_join(df1,df2, by=c("id")) -> df     

# adjust data   
df %>%
  group_by(id) %>% # to make changes per row
  mutate(days=paste0(day1,day2,day3,day4,day5,day6,day7)) %>% #pastes the values together
  mutate(days_correct=substring(days,Date_nr_df1)) %>% # applies the start day
  mutate(Occurrence_seq=str_split(days_correct, fixed("0"))[[1]][1]) %>% # extracts all days before 0
  mutate(Occurrence=nchar(Occurrence_seq)) %>%  ## counts these days
  mutate(Occurrence=case_when(Occurrence==1 ~ 0, TRUE ~ as.numeric(Occurrence))) %>% # sets Occurrence to 0 if there is no consecutive occurrence
  mutate(Position=Date_nr_df2-Date_nr_df1+1) %>% ## calculates the position you wanted
  mutate(c1=case_when(Occurrence==0 ~0, TRUE ~ c1),
         c2=case_when(Occurrence==0 ~0, TRUE ~c1),
         Position=case_when(Occurrence==0 ~ 0, TRUE ~ as.numeric(Position))) %>% 
  ungroup() %>% ungroups the df
  select(id,c1,c2,Occurrence,Position) # selects the wanted variables
#> # A tibble: 3 x 5
#>      id    c1    c2 Occurrence Position
#>   <dbl> <dbl> <dbl>      <dbl>    <dbl>
#> 1    12     3     3          7        3
#> 2   123     0     0          0        0
#> 3    10     3     3          2        1

reprex包(v0.2.1)

Created on 2020-04-10 by the reprex package (v0.2.1)

这篇关于根据日期合并长格式的两个数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆