用最新日期创建新列 [英] Create new column with most recent date

查看:58
本文介绍了用最新日期创建新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对data.table中的数据集有一些麻烦.基本上,我有2列:计划的交付日期和重新计划的交付日期.然而,一些值留为空白.一个例子:

I have some trouble with a dataset I have in data.table. Basically, I have 2 columns: scheduled delivery date and rescheduled delivery date. However, some values are left blank. An example:

Scheduled        Rescheduled
NA               NA
2016-11-14       2016-11-17
2016-11-15       NA
2016-11-13       2016-11-11
NA               2016-11-15

我想创建一个新列,该列指示最新的两列的日期,例如名为 max_scheduled_date 的日期.因此,如果 Rescheduled 为NA,则 max_scheduled_date 采用 Scheduled 的值,而 max_scheduled_date 应该如果 Scheduled 为NA,则采用 Rescheduled 的值.当两个列为不适用, max_scheduled_date 应该显然不适用.当两列都有日期时,它应该是最新的.创建此文件时遇到很多问题,但没有得到我想要的结果.

I want to create a new column, which indicates the most recent date of both columns, for instance named max_scheduled_date. Therefore, if Rescheduled is NA, the max_scheduled_date should take the value of Scheduled, whilst max_scheduled_date should take the value of Rescheduled if Scheduled is NA. When both columns are NA, max_scheduled_date should obviously take NA. When both columns have a date, it should take the most recent one. I have a lot of problems creating this and do not get the results I want.

日期是POSIXct,很不幸,这给我带来了一些麻烦.

The dates are POSIXct, which gives me some trouble unfortunately.

有人可以帮我吗?先感谢您,亲切的问候,阿曼达

Can someone help me out? Thank you in advance, Kind regards, Amanda

推荐答案

问题被标记为 data.table 时,这也是 data.table 解决方案.

As the question is tagged with data.table, here is also a data.table solution.

pmax() POSIXct 上似乎可以很好地工作.因此,我认为没有理由将日期列从 POSIXct 强制为 Date 类.

pmax() seems to work sufficiently well with POSIXct. Therefore, I see no reason to coerce the date columns from POSIXct to Date class.

setDT(DF)[, max_scheduled_date := pmax(Scheduled, Rescheduled, na.rm = TRUE)]
DF

    Scheduled Rescheduled max_scheduled_date
1:       <NA>        <NA>               <NA>
2: 2016-11-14  2016-11-17         2016-11-17
3: 2016-11-15        <NA>         2016-11-15
4: 2016-11-13  2016-11-11         2016-11-13
5:       <NA>  2016-11-15         2016-11-15

请注意,新列是通过引用附加的,即不复制整个对象.

Note that the new column is appended by reference, i.e., without copying the whole object.

DF <- setDF(fread(
  "Scheduled        Rescheduled
NA               NA
2016-11-14       2016-11-17
2016-11-15       NA
2016-11-13       2016-11-11
NA               2016-11-15"
)[, lapply(.SD, as.POSIXct)])
str(DF)

'data.frame': 5 obs. of  2 variables:
 $ Scheduled  : POSIXct, format: NA "2016-11-14" "2016-11-15" ...
 $ Rescheduled: POSIXct, format: NA "2016-11-17" NA ...

这篇关于用最新日期创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆