无法在data.table中通过引用将列分配给.Date [英] Cannot assign columns as.Date by reference in data.table
问题描述
我在使用 by =时分配一个新列为
。它创建一个整数列,而不是预期的 Date
或 IDate
Date
。
require (data.table)
dt< - data.table(date = as.IDate(sample(10000:11000,10),
origin =1970-01-01))
dt [,group:= rep(1:2,5)]
print(dt)
#日期组
#1:1997-06-12 1
#2:1998-02-19 2
#3:1998-04-25 1
#4:1998-01-27 2
#5:1997-10-29 1
#6:1998-05-08 2
#7:1999-05-09 1
#8:1999-06-26 2
#9:1997-11- 01 1
#10:1997-07-19 2
>
dt [,min.date:= min(date)]
print(dt)
#date group min.date
#1:1997-06-12 1 1997-06-12
#2:1998-02-19 2 1997-06-12
#3:1998 -04-25 1 1997-06-12
#4:1998-01-27 2 1997-06-12
#5:1997-10-29 1 1997-06-12
#6:1998-05-08 2 1997-06-12
#7:1999-05-09 1 1997-06-12
#8:1999-06-26 2 1997-06-12
#9:1997-11-01 1 1997-06-12
#10:1997-07-19 2 1997-06-12
但这里有问题:
dt [,min.group .date:= as.IDate(min(date)),by = group]
print(dt)
#date group min.date min.group.date
# 1:1997-06-12 1 1997-06-12 10024
#2:1998-02-19 2 1997-06-12 10061
#3:1998-04-25 1 1997-06- 12 10024
#4:1998-01-27 2 1997-06-12 10061
#5:1997-10-29 1 1997-06-12 10024
#6:1998-05 -08 2 1997-06-12 10061
#7:1999-05-09 1 1997-06-12 10024
#8:1999-06-26 2 1997-06-12 10061
#9:1997-11-01 1 1997-06-12 10024
#10:1997-07-19 2 1997-06-12 10061
min.group.date
是数字而不是日期
。
dt [,class(min.group.date)]
#[1]数字
如果我将列初始化为 Date
或 IDate
,它会按预期工作:
dt< - data.table(date = as.IDate(sample(10000:11000,10),origin =1970-01-01))
dt [,group:= rep(1:2,5)
dt [,min.group.date:= as.IDate(NA)]
dt [,min.group.date:= min(date),by = group]
dt [,class(min.group.date)]
#[1]IDateDate
解决方案保罗,如果你想要的是按最小日期分组,这行会做到:
dt [,min(date),by = group]
你应该看到(下面的日期显然不同于你的,因为你的例子中的'sample'命令):
group V1
1:1 1997-11-19
2:2 1997-12-04
如果你想看到每一行你可以加入表:
setkey(dt,group) #always good practice
dt_min = dt [,min(date),by = group]
setnames(dt_min,V1,min.group.Date)#不要使用colnames help('setnames')
dt [dt_min]
组日期min.group.Date
1:1 1999-01-30 1997-11-19
2:1 1999-11-27 1997-11-19
3:1 1999-11-11 1997-11-19
4:1 1997-11-19 1997-11- 19
5:1 1999-05-06 1997-11-19
6:2 1999-07-11 1997-12-04
7:2 1997-12-04 1997-12 -04
8:2 1998-07-28 1997-12-04
9:2 1998-10-23 1997-12-04
10:2 1998-06-01 1997- 12-04
I'm assigning a new column as
Date
orIDate
while usingby =
. It's creating an integer column, not aDate
as expected.require(data.table) dt <- data.table(date = as.IDate(sample(10000:11000, 10), origin = "1970-01-01")) dt[, group := rep(1:2, 5)] print(dt) # date group # 1: 1997-06-12 1 # 2: 1998-02-19 2 # 3: 1998-04-25 1 # 4: 1998-01-27 2 # 5: 1997-10-29 1 # 6: 1998-05-08 2 # 7: 1999-05-09 1 # 8: 1999-06-26 2 # 9: 1997-11-01 1 # 10: 1997-07-19 2
This works:
dt[, min.date := min(date)] print(dt) # date group min.date # 1: 1997-06-12 1 1997-06-12 # 2: 1998-02-19 2 1997-06-12 # 3: 1998-04-25 1 1997-06-12 # 4: 1998-01-27 2 1997-06-12 # 5: 1997-10-29 1 1997-06-12 # 6: 1998-05-08 2 1997-06-12 # 7: 1999-05-09 1 1997-06-12 # 8: 1999-06-26 2 1997-06-12 # 9: 1997-11-01 1 1997-06-12 # 10: 1997-07-19 2 1997-06-12
But here's the problem:
dt[, min.group.date := as.IDate(min(date)), by = group] print(dt) # date group min.date min.group.date # 1: 1997-06-12 1 1997-06-12 10024 # 2: 1998-02-19 2 1997-06-12 10061 # 3: 1998-04-25 1 1997-06-12 10024 # 4: 1998-01-27 2 1997-06-12 10061 # 5: 1997-10-29 1 1997-06-12 10024 # 6: 1998-05-08 2 1997-06-12 10061 # 7: 1999-05-09 1 1997-06-12 10024 # 8: 1999-06-26 2 1997-06-12 10061 # 9: 1997-11-01 1 1997-06-12 10024 # 10: 1997-07-19 2 1997-06-12 10061
min.group.date
is numeric instead ofDate
.dt[, class(min.group.date)] # [1] "numeric"
If I initialize the column as a
Date
orIDate
, it works as expected:dt <- data.table(date = as.IDate(sample(10000:11000, 10), origin = "1970-01-01")) dt[, group := rep(1:2, 5)] dt[, min.group.date := as.IDate(NA)] dt[, min.group.date := min(date), by = group] dt[, class(min.group.date)] # [1] "IDate" "Date"
解决方案Paul, if all you want is to group by minimum dates, this line will do it:
dt[,min(date),by=group]
you should see (the dates below obviously differ from yours because of the 'sample' command in your example):
group V1 1: 1 1997-11-19 2: 2 1997-12-04
If you want to see every row you can join the tables:
setkey(dt,group) #always good practice dt_min=dt[,min(date),by=group] setnames(dt_min,"V1","min.group.Date") #you should NOT use colnames (see help('setnames') dt[dt_min] group date min.group.Date 1: 1 1999-01-30 1997-11-19 2: 1 1999-11-27 1997-11-19 3: 1 1999-11-11 1997-11-19 4: 1 1997-11-19 1997-11-19 5: 1 1999-05-06 1997-11-19 6: 2 1999-07-11 1997-12-04 7: 2 1997-12-04 1997-12-04 8: 2 1998-07-28 1997-12-04 9: 2 1998-10-23 1997-12-04 10: 2 1998-06-01 1997-12-04
这篇关于无法在data.table中通过引用将列分配给.Date的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!