在具有多个观察周期的数据框中添加缺少的日期值 [英] Adding missing date values in a data frame with multiple observation periods
问题描述
提前感谢
我正在为三个不同的个人添加一个未包含在观察期内的缺少日期值。
I am trying to add missing date values that were not included in a observation period for three different individuals.
我的数据如下所示:
IndID Date Event Number Percent
1 P01 2011-03-04 1 2 0.390
2 P01 2011-03-11 1 2 0.975
3 P01 2011-03-13 0 9 0.795
4 P01 2011-03-14 0 10 0.516
5 P01 2011-03-15 0 1 0.117
6 P01 2011-03-17 0 7 0.093
IndID
是个人ID( P01
, P03
, P06
)。 日期
显然是Date。 事件
是一个二进制变量,指示事件是否发生( 0
=否和 1
= yes)。
列数字
和百分比
不直接相关,但需要保留,因此包含在这里。
IndID
is the individual ID (P01
, P03
, P06
). Date
is obviously the Date. Event
is a binary variable indicating whether an event occurred (0
= no and 1
= yes).
Columns Number
and Percent
are not directly relevant, but need to be preserved and are thus included here.
我的示例数据框( PostData
)包含在下面使用 dput
。
My sample data frame (PostData
) is included below using dput
.
对于每个 IndID
最后日期
分别是观察期的开始和结束,其中缺少日期。在这里,我的目标是为每个人添加缺少的日期,并在 Event
列中添加一个 0
。其他列(号
和 Percent
)可以保留为空。
For each IndID
the first and last Date
are the beginning and end of an observation period respectively, within which there are missing dates. Here, my goal is to add the missing dates for each individual and add a 0
in the Event
column. The other columns (Number
and Percent
) can remain blank.
这篇文章有有用的,但缺乏关于我的主要问题的信息 - 多个人。
This post has been useful, but lacks info on my main problem - multiple individuals.
每个人的观察期是从 min(PostData $ Date)
到 max(PostData $ Date)
。我一直在尝试为每个人创建一个完整的日期序列,然后为合并
与中的现有数据框
循环。有一个更好的主意。
The observation period for each individual is from min(PostData$Date)
to max(PostData$Date)
. I have been attempting to create a complete Date sequence for each individual and then merge
it with the existing data frame within a for
loop. There is surely a better idea.
任何建议都不胜感激。
PostData <-structure(list(IndID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L), .Label = c("P01", "P02", "P03", "P05", "P06", "P07",
"P08", "P09", "P10", "P11", "P12", "P13"), class = "factor"),
Date = structure(c(1299196800, 1299801600, 1299974400, 1300060800,
1300147200, 1300320000, 1300406400, 1310083200, 1310169600,
1310515200, 1310774400, 1310947200, 1311033600, 1311292800,
1311552000, 1323129600, 1323388800, 1323648000, 1323993600,
1324080000, 1324166400, 1324339200, 1327622400, 1327795200,
1327881600), class = c("POSIXct", "POSIXt"), tzone = "GMT"),
Event = c(1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L,
0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L), Number = c(2L,
2L, 9L, 10L, 1L, 7L, 5L, 9L, 1L, 4L, 5L, 2L, 0L, 1L, 10L,
5L, 0L, 6L, 5L, 10L, 9L, 4L, 4L, 8L, 1L), Percent = c(0.39,
0.975, 0.795, 0.516, 0.117, 0.093, 0.528, 0.659, 0.308, 0.055,
0.185, 0.761, 0.132, 0.676, 0.368, 0.383, 0.272, 0.113, 0.974,
0.696, 0.941, 0.751, 0.758, 0.29, 0.15)), .Names = c("IndID",
"Date", "Event", "Number", "Percent"), row.names = c(NA, 25L),
class = "data.frame")
推荐答案
基本R版本:
do.call(rbind,
by(
PostData,
PostData$IndID,
function(x) {
out <- merge(
data.frame(
IndID=x$IndID[1],
Date=seq.POSIXt(min(x$Date),max(x$Date),by="1 day")
),
x,
all.x=TRUE
)
out$Event[is.na(out$Event)] <- 0
out
}
)
)
结果:
IndID Date Event Number Percent
P01.1 P01 2011-03-04 1 2 0.390
P01.2 P01 2011-03-05 0 NA NA
P01.3 P01 2011-03-06 0 NA NA
P01.4 P01 2011-03-07 0 NA NA
P01.5 P01 2011-03-08 0 NA NA
P01.6 P01 2011-03-09 0 NA NA
P01.7 P01 2011-03-10 0 NA NA
P01.8 P01 2011-03-11 1 2 0.975
<<etc>>
这篇关于在具有多个观察周期的数据框中添加缺少的日期值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!