将缺少的时间行插入数据框 [英] Insert missing time rows into a dataframe
问题描述
假设我有一个数据框:
df <- data.frame(group = c('A','A','A','B','B','B'),
time = c(1,2,4,1,2,3),
data = c(5,6,7,8,9,10))
我想要做的是将数据插入序列中丢失的数据框.因此,在上面的示例中,对于A组,我丢失了time
= 3的数据,对于B组,我缺少了time
= 4的数据.我本质上想将0替换为data
列.>
我将如何添加这些额外的行?
目标是:
df <- data.frame(group = c('A','A','A','A','B','B','B','B'),
time = c(1,2,3,4,1,2,3,4),
data = c(5,6,0,7,8,9,10,0))
我的真实数据是几千个数据点,因此无法手动进行.
您可以尝试merge/expand.grid
res <- merge(
expand.grid(group=unique(df$group), time=unique(df$time)),
df, all=TRUE)
res$data[is.na(res$data)] <- 0
res
# group time data
#1 A 1 5
#2 A 2 6
#3 A 3 0
#4 A 4 7
#5 B 1 8
#6 B 2 9
#7 B 3 10
#8 B 4 0
或使用data.table
library(data.table)
setkey(setDT(df), group, time)[CJ(group=unique(group), time=unique(time))
][is.na(data), data:=0L]
# group time data
#1: A 1 5
#2: A 2 6
#3: A 3 0
#4: A 4 7
#5: B 1 8
#6: B 2 9
#7: B 3 10
#8: B 4 0
更新
如评论中提到的@thelatemail,如果所有组中都不存在特定的时间"值,则上述方法将失败.也许这会更笼统.
res <- merge(
expand.grid(group=unique(df$group),
time=min(df$time):max(df$time)),
df, all=TRUE)
res$data[is.na(res$data)] <- 0
,并类似地在data.table解决方案中将time=unique(time)
替换为time= min(time):max(time)
.
Let's say I have a dataframe:
df <- data.frame(group = c('A','A','A','B','B','B'),
time = c(1,2,4,1,2,3),
data = c(5,6,7,8,9,10))
What I want to do is insert data into the data frame where it was missing in the sequence. So in the above example, I'm missing data for time
= 3 for group A, and time
= 4 for Group B. I would essentially want to put 0's in the place of the data
column.
How would I go about adding these additional rows?
The goal would be:
df <- data.frame(group = c('A','A','A','A','B','B','B','B'),
time = c(1,2,3,4,1,2,3,4),
data = c(5,6,0,7,8,9,10,0))
My real data is a couple thousand data points, so manually doing so isn't possible.
You can try merge/expand.grid
res <- merge(
expand.grid(group=unique(df$group), time=unique(df$time)),
df, all=TRUE)
res$data[is.na(res$data)] <- 0
res
# group time data
#1 A 1 5
#2 A 2 6
#3 A 3 0
#4 A 4 7
#5 B 1 8
#6 B 2 9
#7 B 3 10
#8 B 4 0
Or using data.table
library(data.table)
setkey(setDT(df), group, time)[CJ(group=unique(group), time=unique(time))
][is.na(data), data:=0L]
# group time data
#1: A 1 5
#2: A 2 6
#3: A 3 0
#4: A 4 7
#5: B 1 8
#6: B 2 9
#7: B 3 10
#8: B 4 0
Update
As @thelatemail mentioned in the comments, the above method would fail if a particular 'time' value is not present in all the groups. May be this would be more general.
res <- merge(
expand.grid(group=unique(df$group),
time=min(df$time):max(df$time)),
df, all=TRUE)
res$data[is.na(res$data)] <- 0
and similarly replace time=unique(time)
with time= min(time):max(time)
in the data.table solution.
这篇关于将缺少的时间行插入数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!