在数据框中添加缺少的日期 [英] Adding missing dates to dataframe
本文介绍了在数据框中添加缺少的日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个数据框,如下所示:
次值
1 2013-07-06 20 :00:00 0.02
2 2013-07-07 20:00:00 0.03
3 2013-07-09 20:00:00 0.13
4 2013-07-10 20:00 :00 0.12
5 2013-07-11 20:00:00 0.03
6 2013-07-14 20:00:00 0.06
7 2013-07-15 20:00:00 0.08
8 2013-07-16 20:00:00 0.07
9 2013-07-17 20:00:00 0.08
数据中缺少一些日期,我想插入它们,并将前一天的值转入这些新行,即获取: p>
次值
1 2013-07-06 20:00:00 0.02
2 2013-07-07 20:00:00 0.03
3 2013-07-08 20:00:00 0.03
4 2013-07-09 20:00:00 0.13
5 2013-07-10 20: 00:00 0.12
6 2013-07-11 20:00:00 0.03
7 2013-07-12 20:00:00 0.03
8 2013-07-13 20:00:00 0.03
9 2013-07-14 20:00:00 0.06
10 2013- 07-15 20:00:00 0.08
11 2013-07-16 20:00:00 0.07
12 2013-07-17 20:00:00 0.08
...
我一直在尝试使用所有日期的向量:
日期< - as.Date(1:length(df),origin = df $ times [1])$ b $ b
/ pre>
我被卡住了,没有找到一种方法,没有一个可怕的循环,我失去了...
感谢您的帮助解决方案某些测试数据(我使用
Date
,你似乎是一个不同的类型,但这不影响算法):data = data.frame(dates = as.Date(c(2011-12-15,2011-12-17,2011-12-19)),
值= as.double(1:3))
#生成**所有**你想要结果的时间戳。
#我使用`seq`,但你可以使用任何其他生成这些时间戳的方法。
alldates = seq(min(data $ dates),max(data $ dates),1)
#过滤出您的`data.frame中已经存在的时间戳`:
#构造一个`data.frame`以附加缺省值:
dates0 = alldates [!(%data $ dates中的alldates%)]
data0 = data.frame(dates = dates0,values = NA_real_)
#附加这个`data.frame`并在时间上度过:
data = rbind(data,data0)
data = data [order数据$日期),]
#转发填充值
#我建议将此代码移动到单独的ffill函数中:
#被证明是非常有用的一般):
current = NA_real_
data $ values = sapply(data $ values,function(x){
current< < - ifelse(is.na(x) x);当前})
I have a data frame which looks like this:
times values 1 2013-07-06 20:00:00 0.02 2 2013-07-07 20:00:00 0.03 3 2013-07-09 20:00:00 0.13 4 2013-07-10 20:00:00 0.12 5 2013-07-11 20:00:00 0.03 6 2013-07-14 20:00:00 0.06 7 2013-07-15 20:00:00 0.08 8 2013-07-16 20:00:00 0.07 9 2013-07-17 20:00:00 0.08
There are a few dates missing from the data, and I would like to insert them and to carry over the value from the previous day into these new rows, i.e. obtain this:
times values 1 2013-07-06 20:00:00 0.02 2 2013-07-07 20:00:00 0.03 3 2013-07-08 20:00:00 0.03 4 2013-07-09 20:00:00 0.13 5 2013-07-10 20:00:00 0.12 6 2013-07-11 20:00:00 0.03 7 2013-07-12 20:00:00 0.03 8 2013-07-13 20:00:00 0.03 9 2013-07-14 20:00:00 0.06 10 2013-07-15 20:00:00 0.08 11 2013-07-16 20:00:00 0.07 12 2013-07-17 20:00:00 0.08 ...
I have been trying to use a vector of all the dates:
dates <- as.Date(1:length(df),origin = df$times[1])
I am stuck, and can't find a way to do it without a horrible for loop in which I'm getting lost... Thank you for your help
解决方案Some test data (I am using
Date
, yours seems to be a different type, but this does not affect the algorithm):data = data.frame(dates = as.Date(c("2011-12-15", "2011-12-17", "2011-12-19")), values = as.double(1:3)) # Generate **all** timestamps at which you want to have your result. # I use `seq`, but you may use any other method of generating those timestamps. alldates = seq(min(data$dates), max(data$dates), 1) # Filter out timestamps that are already present in your `data.frame`: # Construct a `data.frame` to append with missing values: dates0 = alldates[!(alldates %in% data$dates)] data0 = data.frame(dates = dates0, values = NA_real_) # Append this `data.frame` and resort in time: data = rbind(data, data0) data = data[order(data$dates),] # forward fill the values # I would recommend to move this code into a separate `ffill` function: # proved to be very useful in general): current = NA_real_ data$values = sapply(data$values, function(x) { current <<- ifelse(is.na(x), current, x); current })
这篇关于在数据框中添加缺少的日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文