在 R 的数据框中插入带零的行 [英] Insert rows with zeros in data frames in R
问题描述
考虑这样一个碎片化的数据集:
Consider a fragmented dataset like this:
ID Date Value
1 1 2012-01-01 5065
4 1 2012-01-04 1508
5 1 2012-01-05 9489
6 1 2012-01-06 7613
7 2 2012-01-07 6896
8 2 2012-01-08 2643
11 3 2012-01-02 7294
12 3 2012-01-03 8726
13 3 2012-01-04 6262
14 3 2012-01-05 2999
15 3 2012-01-06 10000
16 3 2012-01-07 1405
18 3 2012-01-09 8372
请注意,对于 (2,3,9,10,17) 缺少观察.我想要的是用Value"= 0 来填充数据集中的一些空白,如下所示:
Notice that observations are missing for (2,3,9,10,17). What I would like, is to fill out some of these gaps in the dataset with "Value" = 0, like so:
ID Date Value
1 1 2012-01-01 5920
2 1 2012-01-02 0
3 1 2012-01-03 0
4 1 2012-01-04 8377
5 1 2012-01-05 7810
6 1 2012-01-06 6452
7 2 2012-01-07 3483
8 2 2012-01-08 5426
9 2 2012-01-09 0
11 3 2012-01-02 7854
12 3 2012-01-03 1948
13 3 2012-01-04 7141
14 3 2012-01-05 5402
15 3 2012-01-06 6412
16 3 2012-01-07 7043
17 3 2012-01-08 0
18 3 2012-01-09 3270
关键是只有在对相同(分组)ID 有过去的观察时才应该插入零.我想避免任何循环,因为完整的数据集非常大.
The point is that the zeros only should be inserted if there is a past observation for the same (grouped) ID. I would like to avoid any loops, as the full dataset is quite large.
有什么建议吗?重现数据框:
Any suggestions? To reproduce the dataframe:
df <- data.frame(matrix(0, nrow = 18, ncol = 3,
dimnames = list(NULL, c("ID","Date","Value"))) )
df[,1] = c(1,1,1,1,1,1,2,2,2,3,3,3,3,3,3,3,3,3)
df[,2] = seq(as.Date("2012-01-01"),
as.Date("2012-01-9"),
by=1)
df[,3] = sample(1000:10000,18,replace=T)
df = df[-c(2,3,9,10,17),]
推荐答案
这里已经有一些可靠的答案,但我建议查看软件包 padr
.
There are already some solid answers here, but I would recommend checking out the package padr
.
library(dplyr)
library(padr)
df %>%
pad(start_val = as.Date("2012-01-01"),
end_val = as.Date("2012-01-09"),
group = "ID") %>%
fill_by_value(Value)
该包还提供了一些非常直观的函数来汇总日期列.
The package gives some pretty intuitive functions for summarizing Date columns as well.
这篇关于在 R 的数据框中插入带零的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!