在数据框中添加缺少的日期 [英] Adding missing dates to dataframe

查看:276
本文介绍了在数据框中添加缺少的日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,如下所示:

 次值
1 2013-07-06 20 :00:00 0.02
2 2013-07-07 20:00:00 0.03
3 2013-07-09 20:00:00 0.13
4 2013-07-10 20:00 :00 0.12
5 2013-07-11 20:00:00 0.03
6 2013-07-14 20:00:00 0.06
7 2013-07-15 20:00:00 0.08
8 2013-07-16 20:00:00 0.07
9 2013-07-17 20:00:00 0.08

数据中缺少一些日期,我想插入它们,并将前一天的值转入这些新行,即获取: p>

 次值
1 2013-07-06 20:00:00 0.02
2 2013-07-07 20:00:00 0.03
3 2013-07-08 20:00:00 0.03
4 2013-07-09 20:00:00 0.13
5 2013-07-10 20: 00:00 0.12
6 2013-07-11 20:00:00 0.03
7 2013-07-12 20:00:00 0.03
8 2013-07-13 20:00:00 0.03
9 2013-07-14 20:00:00 0.06
10 2013- 07-15 20:00:00 0.08
11 2013-07-16 20:00:00 0.07
12 2013-07-17 20:00:00 0.08
...

我一直在尝试使用所有日期的向量:

 日期<  -  as.Date(1:length(df),origin = df $ times [1])$ ​​b $ b  / pre> 

我被卡住了,没有找到一种方法,没有一个可怕的循环,我失去了...
感谢您的帮助

解决方案

某些测试数据(我使用 Date ,你似乎是一个不同的类型,但这不影响算法):

  data = data.frame(dates = as.Date(c(2011-12-15,2011-12-17,2011-12-19)),
值= as.double(1:3))

#生成**所有**你想要结果的时间戳。
#我使用`seq`,但你可以使用任何其他生成这些时间戳的方法。

alldates = seq(min(data $ dates),max(data $ dates),1)

#过滤出您的`data.frame中已经存在的时间戳`:
#构造一个`data.frame`以附加缺省值:
dates0 = alldates [!(%data $ dates中的alldates%)]
data0 = data.frame(dates = dates0,values = NA_real_)

#附加这个`data.frame`并在时间上度过:
data = rbind(data,data0)
data = data [order数据$日期),]

#转发填充值
#我建议将此代码移动到单独的ffill函数中:
#被证明是非常有用的一般):
current = NA_real_
data $ values = sapply(data $ values,function(x){
current< < - ifelse(is.na(x) x);当前})


I have a data frame which looks like this:

    times                      values
1   2013-07-06 20:00:00        0.02
2   2013-07-07 20:00:00        0.03
3   2013-07-09 20:00:00        0.13
4   2013-07-10 20:00:00        0.12
5   2013-07-11 20:00:00        0.03
6   2013-07-14 20:00:00        0.06
7   2013-07-15 20:00:00        0.08
8   2013-07-16 20:00:00        0.07
9   2013-07-17 20:00:00        0.08

There are a few dates missing from the data, and I would like to insert them and to carry over the value from the previous day into these new rows, i.e. obtain this:

    times                      values
1   2013-07-06 20:00:00        0.02
2   2013-07-07 20:00:00        0.03
3   2013-07-08 20:00:00        0.03
4   2013-07-09 20:00:00        0.13
5   2013-07-10 20:00:00        0.12
6   2013-07-11 20:00:00        0.03
7   2013-07-12 20:00:00        0.03
8   2013-07-13 20:00:00        0.03
9   2013-07-14 20:00:00        0.06
10  2013-07-15 20:00:00        0.08
11  2013-07-16 20:00:00        0.07
12  2013-07-17 20:00:00        0.08
...

I have been trying to use a vector of all the dates:

dates <- as.Date(1:length(df),origin = df$times[1])

I am stuck, and can't find a way to do it without a horrible for loop in which I'm getting lost... Thank you for your help

解决方案

Some test data (I am using Date, yours seems to be a different type, but this does not affect the algorithm):

data = data.frame(dates = as.Date(c("2011-12-15", "2011-12-17", "2011-12-19")), 
                  values = as.double(1:3))

# Generate **all** timestamps at which you want to have your result. 
# I use `seq`, but you may use any other method of generating those timestamps. 

alldates = seq(min(data$dates), max(data$dates), 1)

# Filter out timestamps that are already present in your `data.frame`:
# Construct a `data.frame` to append with missing values:
dates0 = alldates[!(alldates %in% data$dates)]
data0 = data.frame(dates = dates0, values = NA_real_)

# Append this `data.frame` and resort in time:
data = rbind(data, data0)
data = data[order(data$dates),]

# forward fill the values 
# I would recommend to move this code into a separate `ffill` function: 
# proved to be very useful in general):
current = NA_real_
data$values = sapply(data$values, function(x) { 
           current <<- ifelse(is.na(x), current, x); current })

这篇关于在数据框中添加缺少的日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆