通过将传送间隔添加到另一个日期列来填充列中缺少的日期值 [英] Fill missing date values in column by adding delivery interval to another date column

查看:216
本文介绍了通过将传送间隔添加到另一个日期列来填充列中缺少的日期值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

数据:

DB1 <- data.frame(orderItemID  = 1:10,     
orderDate = c("2013-01-21","2013-03-31","2013-04-12","2013-06-01","2014-01-01", "2014-02-19","2014-02-27","2014-10-02","2014-10-31","2014-11-21"),  
deliveryDate = c("2013-01-23", "2013-03-01", "NA", "2013-06-04", "2014-01-03", "NA", "2014-02-28", "2014-10-04", "2014-11-01", "2014-11-23"))

预期成果:

   DB1 <- data.frame(orderItemID  = 1:10,     
 orderDate= c("2013-01-21","2013-03-31","2013-04-12","2013-06-01","2014-01-01", "2014-02-19","2014-02-27","2014-10-02","2014-10-31","2014-11-21"),  
deliveryDate = c("2013-01-23", "2013-03-01", "2013-04-14", "2013-06-04", "2014-01-03", "2014-02-21", "2014-02-28", "2014-10-04", "2014-11-01", "2014-11-23"))

嘿,伙计们,
it's我不幸(我想)我有一个非常困难的问题...正如你可以看到上面我有一些mi在交货日期中输入值,我想用另一个日期替换它们。该日期应为特定项目的订单日期+(完整)天的平均交货时间(在该示例中为1,75天,因此为2天)
平均交货时间是从平均值计算的时间所有不含缺失值的样本=(2天+ 1天+ 3天+ 2天+ 1天+ 2天+ 1天+ 2天):8 = 1,75

Hey guys, it´s me again ;) and unfortunately (I think) I have a pretty difficult question... As you can see above I have some missing values in the delivery dates and I want to replace them by another date. That date should be the order date of the specific item + the average delivery time in (full) days.(In the example its 1,75days so it 2days) The average delivery time is the time calculated from the average value of all samples that do not contain Missing values = (2days+1day+3days+2days+1day+2days+1day+2days):8=1,75

所以平均交货时间需要在第二步计算出第一步,订单日期+需要输入平均交货时间(以全天为单位),而不是NA。

so in a first step the average delivery time needs to be calculated an in the second step the order date + the average delivery time (in full days) needs to be entered instead of the NA´s

我已经尝试了一些[is.na(DB1 $ deliveryDate)],但不幸的是我不知道如何解决问题...

I tried already a little with [is.na(DB1$deliveryDate)] but unfortunately I have no good idea how to solve the problem...

希望有人得到一个想法

推荐答案

你想做日期算术,并填写 deliveryDate 列,将日期间隔两天添加到 orderDate 列。 lubridate 为时间间隔提供便利功能,例如 days(),weeks(),months(),years(),hours(),minutes ,seconds()为此目的。
首先,您必须将(欧洲格式)的日期分析解析成R日期对象。

You want to do date-arithmetic, and fill NAs in deliveryDate column by adding a date-interval of two days to orderDate column. lubridate supplies convenience functions for time intervals like days(), weeks(), months(), years(), hours(), minutes(), seconds() for exactly that purpose. And first, you have to parse your (European-format) datestrings into R date objects.

如下所示,使用lubridate进行日期算术和数据框操作的dplyr:

Something like the following, using lubridate for date-arithmetic and dplyr for dataframe manipulation:

require(dplyr)

DB1$orderDate    = as.POSIXct(DB1$orderDate, format="%d.%m.%y", tz='UTC')
DB1$deliveryDate = as.POSIXct(DB1$deliveryDate, format="%d.%m.%y", tz='UTC')

DB1 %>% group_by(orderDate) %>%
        summarize(delivery_time = (deliveryDate - orderDate)) %>%
        ungroup() %>% summarize(median(delivery_time, na.rm=T))

# median(delivery_time, na.rm = T)
#                         1.5 days
# so you round up to 2 days
delivery_days = 2.0

require(lubridate)
DB1 <- DB1 %>% filter(is.na(deliveryDate)) %>%
                mutate(deliveryDate = orderDate + days(2))

# orderItemID  orderDate deliveryDate
#           3 2013-04-12   2013-04-14
#           6 2014-02-19   2014-02-21

这篇关于通过将传送间隔添加到另一个日期列来填充列中缺少的日期值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆