无法在data.table中通过引用将列分配给.Date [英] Cannot assign columns as.Date by reference in data.table

查看:88
本文介绍了无法在data.table中通过引用将列分配给.Date的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在使用 by =时分配一个新列为 Date IDate 。它创建一个整数列,而不是预期的 Date

  require (data.table)
dt< - data.table(date = as.IDate(sample(10000:11000,10),
origin =1970-01-01))
dt [,group:= rep(1:2,5)]
print(dt)

#日期组
#1:1997-06-12 1
#2:1998-02-19 2
#3:1998-04-25 1
#4:1998-01-27 2
#5:1997-10-29 1
#6:1998-05-08 2
#7:1999-05-09 1
#8:1999-06-26 2
#9:1997-11- 01 1
#10:1997-07-19 2

>

  dt [,min.date:= min(date)] 
print(dt)

#date group min.date
#1:1997-06-12 1 1997-06-12
#2:1998-02-19 2 1997-06-12
#3:1998 -04-25 1 1997-06-12
#4:1998-01-27 2 1997-06-12
#5:1997-10-29 1 1997-06-12
#6:1998-05-08 2 1997-06-12
#7:1999-05-09 1 1997-06-12
#8:1999-06-26 2 1997-06-12
#9:1997-11-01 1 1997-06-12
#10:1997-07-19 2 1997-06-12

但这里有问题:

  dt [,min.group .date:= as.IDate(min(date)),by = group] 
print(dt)

#date group min.date min.group.date
# 1:1997-06-12 1 1997-06-12 10024
#2:1998-02-19 2 1997-06-12 10061
#3:1998-04-25 1 1997-06- 12 10024
#4:1998-01-27 2 1997-06-12 10061
#5:1997-10-29 1 1997-06-12 10024
#6:1998-05 -08 2 1997-06-12 10061
#7:1999-05-09 1 1997-06-12 10024
#8:1999-06-26 2 1997-06-12 10061
#9:1997-11-01 1 1997-06-12 10024
#10:1997-07-19 2 1997-06-12 10061

min.group.date 是数字而不是日期

  dt [,class(min.group.date)] 

#[1]数字

如果我将列初始化为 Date IDate ,它会按预期工作:

  dt< -  data.table(date = as.IDate(sample(10000:11000,10),origin =1970-01-01))
dt [,group:= rep(1:2,5)

dt [,min.group.date:= as.IDate(NA)]
dt [,min.group.date:= min(date),by = group]

dt [,class(min.group.date)]
#[1]IDateDate

解决方案

保罗,如果你想要的是按最小日期分组,这行会做到:

  dt [,min(date),by = group] 

你应该看到(下面的日期显然不同于你的,因为你的例子中的'sample'命令):

  group V1 
1:1 1997-11-19
2:2 1997-12-04

如果你想看到每一行你可以加入表:

  setkey(dt,group) #always good practice 
dt_min = dt [,min(date),by = group]
setnames(dt_min,V1,min.group.Date)#不要使用colnames help('setnames')
dt [dt_min]


组日期min.group.Date
1:1 1999-01-30 1997-11-19
2:1 1999-11-27 1997-11-19
3:1 1999-11-11 1997-11-19
4:1 1997-11-19 1997-11- 19
5:1 1999-05-06 1997-11-19
6:2 1999-07-11 1997-12-04
7:2 1997-12-04 1997-12 -04
8:2 1998-07-28 1997-12-04
9:2 1998-10-23 1997-12-04
10:2 1998-06-01 1997- 12-04


I'm assigning a new column as Date or IDate while using by =. It's creating an integer column, not a Date as expected.

require(data.table)
dt <- data.table(date = as.IDate(sample(10000:11000, 10), 
                                 origin = "1970-01-01"))
dt[, group := rep(1:2, 5)]
print(dt)

#           date group
#  1: 1997-06-12     1
#  2: 1998-02-19     2
#  3: 1998-04-25     1
#  4: 1998-01-27     2
#  5: 1997-10-29     1
#  6: 1998-05-08     2
#  7: 1999-05-09     1
#  8: 1999-06-26     2
#  9: 1997-11-01     1
# 10: 1997-07-19     2

This works:

dt[, min.date := min(date)]
print(dt)

#           date group   min.date
#  1: 1997-06-12     1 1997-06-12
#  2: 1998-02-19     2 1997-06-12
#  3: 1998-04-25     1 1997-06-12
#  4: 1998-01-27     2 1997-06-12
#  5: 1997-10-29     1 1997-06-12
#  6: 1998-05-08     2 1997-06-12
#  7: 1999-05-09     1 1997-06-12
#  8: 1999-06-26     2 1997-06-12
#  9: 1997-11-01     1 1997-06-12
# 10: 1997-07-19     2 1997-06-12

But here's the problem:

dt[, min.group.date := as.IDate(min(date)), by = group]
print(dt)

#           date group   min.date min.group.date
#  1: 1997-06-12     1 1997-06-12          10024
#  2: 1998-02-19     2 1997-06-12          10061
#  3: 1998-04-25     1 1997-06-12          10024
#  4: 1998-01-27     2 1997-06-12          10061
#  5: 1997-10-29     1 1997-06-12          10024
#  6: 1998-05-08     2 1997-06-12          10061
#  7: 1999-05-09     1 1997-06-12          10024
#  8: 1999-06-26     2 1997-06-12          10061
#  9: 1997-11-01     1 1997-06-12          10024
# 10: 1997-07-19     2 1997-06-12          10061

min.group.date is numeric instead of Date.

dt[, class(min.group.date)]

# [1] "numeric"

If I initialize the column as a Date or IDate, it works as expected:

dt <- data.table(date = as.IDate(sample(10000:11000, 10), origin = "1970-01-01"))
dt[, group := rep(1:2, 5)]

dt[, min.group.date := as.IDate(NA)]
dt[, min.group.date := min(date), by = group]

dt[, class(min.group.date)]
# [1] "IDate" "Date"

解决方案

Paul, if all you want is to group by minimum dates, this line will do it:

dt[,min(date),by=group]

you should see (the dates below obviously differ from yours because of the 'sample' command in your example):

   group         V1
1:     1 1997-11-19
2:     2 1997-12-04

If you want to see every row you can join the tables:

setkey(dt,group) #always good practice
dt_min=dt[,min(date),by=group]
setnames(dt_min,"V1","min.group.Date") #you should NOT use colnames (see help('setnames')
dt[dt_min]


    group       date min.group.Date
 1:     1 1999-01-30     1997-11-19
 2:     1 1999-11-27     1997-11-19
 3:     1 1999-11-11     1997-11-19
 4:     1 1997-11-19     1997-11-19
 5:     1 1999-05-06     1997-11-19
 6:     2 1999-07-11     1997-12-04
 7:     2 1997-12-04     1997-12-04
 8:     2 1998-07-28     1997-12-04
 9:     2 1998-10-23     1997-12-04
10:     2 1998-06-01     1997-12-04

这篇关于无法在data.table中通过引用将列分配给.Date的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆