在具有多个观察周期的数据框中添加缺少的日期值 [英] Adding missing date values in a data frame with multiple observation periods

查看:105
本文介绍了在具有多个观察周期的数据框中添加缺少的日期值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

提前感谢

我正在为三个不同的个人添加一个未包含在观察期内的缺少日期值。

I am trying to add missing date values that were not included in a observation period for three different individuals.

我的数据如下所示:

 IndID       Date Event Number Percent
1   P01 2011-03-04     1      2   0.390
2   P01 2011-03-11     1      2   0.975
3   P01 2011-03-13     0      9   0.795
4   P01 2011-03-14     0     10   0.516
5   P01 2011-03-15     0      1   0.117
6   P01 2011-03-17     0      7   0.093

IndID 是个人ID( P01 P03 P06 )。 日期显然是Date。 事件是一个二进制变量,指示事件是否发生( 0 =否和 1 = yes)。

数字百分比不直接相关,但需要保留,因此包含在这里。

IndID is the individual ID (P01, P03, P06). Date is obviously the Date. Event is a binary variable indicating whether an event occurred (0 = no and 1 = yes).
Columns Number and Percent are not directly relevant, but need to be preserved and are thus included here.

我的示例数据框( PostData )包含在下面使用 dput

My sample data frame (PostData) is included below using dput.

对于每个 IndID 最后日期分别是观察期的开始和结束,其中缺少日期。在这里,我的目标是为每个人添加缺少的日期,并在 Event 列中添加一个 0 。其他列( Percent )可以保留为空。

For each IndID the first and last Date are the beginning and end of an observation period respectively, within which there are missing dates. Here, my goal is to add the missing dates for each individual and add a 0 in the Event column. The other columns (Number and Percent) can remain blank.

这篇文章有有用的,但缺乏关于我的主要问题的信息 - 多个人。

This post has been useful, but lacks info on my main problem - multiple individuals.

每个人的观察期是从 min(PostData $ Date) max(PostData $ Date)。我一直在尝试为每个人创建一个完整的日期序列,然后为合并中的现有数据框循环。有一个更好的主意。

The observation period for each individual is from min(PostData$Date) to max(PostData$Date). I have been attempting to create a complete Date sequence for each individual and then merge it with the existing data frame within a for loop. There is surely a better idea.

任何建议都不胜感激。

PostData <-structure(list(IndID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
  3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
  5L, 5L), .Label = c("P01", "P02", "P03", "P05", "P06", "P07", 
  "P08", "P09", "P10", "P11", "P12", "P13"), class = "factor"), 
  Date = structure(c(1299196800, 1299801600, 1299974400, 1300060800, 
  1300147200, 1300320000, 1300406400, 1310083200, 1310169600, 
  1310515200, 1310774400, 1310947200, 1311033600, 1311292800, 
  1311552000, 1323129600, 1323388800, 1323648000, 1323993600, 
  1324080000, 1324166400, 1324339200, 1327622400, 1327795200, 
  1327881600), class = c("POSIXct", "POSIXt"), tzone = "GMT"), 
  Event = c(1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 
  0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L), Number = c(2L, 
  2L, 9L, 10L, 1L, 7L, 5L, 9L, 1L, 4L, 5L, 2L, 0L, 1L, 10L, 
  5L, 0L, 6L, 5L, 10L, 9L, 4L, 4L, 8L, 1L), Percent = c(0.39, 
  0.975, 0.795, 0.516, 0.117, 0.093, 0.528, 0.659, 0.308, 0.055, 
  0.185, 0.761, 0.132, 0.676, 0.368, 0.383, 0.272, 0.113, 0.974, 
  0.696, 0.941, 0.751, 0.758, 0.29, 0.15)), .Names = c("IndID", 
  "Date", "Event", "Number", "Percent"), row.names = c(NA, 25L), 
  class = "data.frame")


推荐答案

基本R版本:

do.call(rbind,
  by(
    PostData,
    PostData$IndID,
    function(x) {
      out <- merge(
        data.frame(
          IndID=x$IndID[1],
          Date=seq.POSIXt(min(x$Date),max(x$Date),by="1 day")
        ),
        x,
        all.x=TRUE
      )
      out$Event[is.na(out$Event)] <- 0
      out
    }  
  )
)

结果:

       IndID       Date Event Number Percent
P01.1    P01 2011-03-04     1      2   0.390
P01.2    P01 2011-03-05     0     NA      NA
P01.3    P01 2011-03-06     0     NA      NA
P01.4    P01 2011-03-07     0     NA      NA
P01.5    P01 2011-03-08     0     NA      NA
P01.6    P01 2011-03-09     0     NA      NA
P01.7    P01 2011-03-10     0     NA      NA
P01.8    P01 2011-03-11     1      2   0.975
<<etc>>

这篇关于在具有多个观察周期的数据框中添加缺少的日期值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆