填写data.table缺失日期的最快方法 [英] Fastest way for filling-in missing dates for data.table

查看:15
本文介绍了填写data.table缺失日期的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从包含日期、订单、金额等字段的 CSV 文件中加载 data.table.

I am loading a data.table from CSV file that has date, orders, amount etc. fields.

输入文件有时没有所有日期的数据.例如如下图:

The input file occasionally does not have data for all dates. For example, as shown below:

> NADayWiseOrders
           date orders  amount guests
  1: 2013-01-01     50 2272.55    149
  2: 2013-01-02      3   64.04      4
  3: 2013-01-04      1   18.81      0
  4: 2013-01-05      2   77.62      0
  5: 2013-01-07      2   35.82      2

在上面的 03-Jan 和 06-Jan 中没有任何条目.

In the above 03-Jan and 06-Jan do not have any entries.

希望用默认值填充缺失的条目(例如,订单、金额等为零),或者将最后一个值向前推进(例如,03-Jan 将重用 02-Jan 的值,而 06-Jan 将重用1 月 5 日值等.)

Would like to fill the missing entries with default values (say, zero for orders, amount etc.), or carry the last vaue forward (e.g, 03-Jan will reuse 02-Jan values and 06-Jan will reuse the 05-Jan values etc..)

用这样的默认值填充这些缺失日期数据的空白的最佳/最佳方法是什么?

What is the best/optimal way to fill-in such gaps of missing dates data with such default values?

这里的答案建议使用 allow.cartesian = TRUEexpand.grid 用于缺少工作日 - 它可能适用于工作日(因为它们只有 7 个工作日) - 但不确定这是否是正确的方法还要处理日期,尤其是在我们处理多年数据时.

The answer here suggests using allow.cartesian = TRUE, and expand.grid for missing weekdays - it may work for weekdays (since they are just 7 weekdays) - but not sure if that would be the right way to go about dates as well, especially if we are dealing with multi-year data.

推荐答案

不知道是不是最快,但是如果数据中没有NA就可以了:

Not sure if it's the fastest, but it'll work if there are no NAs in the data:

# just in case these aren't Dates. 
NADayWiseOrders$date <- as.Date(NADayWiseOrders$date)
# all desired dates.
alldates <- data.table(date=seq.Date(min(NADayWiseOrders$date), max(NADayWiseOrders$date), by="day"))
# merge
dt <- merge(NADayWiseOrders, alldates, by="date", all=TRUE)
# now carry forward last observation (alternatively, set NA's to 0)
require(xts)
na.locf(dt)

这篇关于填写data.table缺失日期的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆