用最新日期创建新列 [英] Create new column with most recent date
问题描述
我对data.table中的数据集有一些麻烦.基本上,我有2列:计划的交付日期和重新计划的交付日期.然而,一些值留为空白.一个例子:
I have some trouble with a dataset I have in data.table. Basically, I have 2 columns: scheduled delivery date and rescheduled delivery date. However, some values are left blank. An example:
Scheduled Rescheduled
NA NA
2016-11-14 2016-11-17
2016-11-15 NA
2016-11-13 2016-11-11
NA 2016-11-15
我想创建一个新列,该列指示最新的两列的日期,例如名为 max_scheduled_date
的日期.因此,如果 Rescheduled
为NA,则 max_scheduled_date
采用 Scheduled
的值,而 max_scheduled_date
应该如果 Scheduled
为NA,则采用 Rescheduled
的值.当两个列为不适用, max_scheduled_date
应该显然不适用.当两列都有日期时,它应该是最新的.创建此文件时遇到很多问题,但没有得到我想要的结果.
I want to create a new column, which indicates the most recent
date of both columns, for instance named max_scheduled_date
.
Therefore, if Rescheduled
is NA, the max_scheduled_date
should
take the value of Scheduled
, whilst max_scheduled_date
should
take the value of Rescheduled
if Scheduled
is NA. When both
columns are NA, max_scheduled_date
should obviously take NA.
When both columns have a date, it should take the most recent one.
I have a lot of problems creating this and do not get the results I want.
日期是POSIXct,很不幸,这给我带来了一些麻烦.
The dates are POSIXct, which gives me some trouble unfortunately.
有人可以帮我吗?先感谢您,亲切的问候,阿曼达
Can someone help me out? Thank you in advance, Kind regards, Amanda
推荐答案
问题被标记为 data.table
时,这也是 data.table
解决方案.
As the question is tagged with data.table
, here is also a data.table
solution.
pmax()
在 POSIXct
上似乎可以很好地工作.因此,我认为没有理由将日期列从 POSIXct
强制为 Date
类.
pmax()
seems to work sufficiently well with POSIXct
. Therefore, I see no reason to coerce the date columns from POSIXct
to Date
class.
setDT(DF)[, max_scheduled_date := pmax(Scheduled, Rescheduled, na.rm = TRUE)]
DF
Scheduled Rescheduled max_scheduled_date
1: <NA> <NA> <NA>
2: 2016-11-14 2016-11-17 2016-11-17
3: 2016-11-15 <NA> 2016-11-15
4: 2016-11-13 2016-11-11 2016-11-13
5: <NA> 2016-11-15 2016-11-15
请注意,新列是通过引用附加的,即不复制整个对象.
Note that the new column is appended by reference, i.e., without copying the whole object.
DF <- setDF(fread(
"Scheduled Rescheduled
NA NA
2016-11-14 2016-11-17
2016-11-15 NA
2016-11-13 2016-11-11
NA 2016-11-15"
)[, lapply(.SD, as.POSIXct)])
str(DF)
'data.frame': 5 obs. of 2 variables:
$ Scheduled : POSIXct, format: NA "2016-11-14" "2016-11-15" ...
$ Rescheduled: POSIXct, format: NA "2016-11-17" NA ...
这篇关于用最新日期创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!