如何在 data.table 中通过引用更改每组中的最后一个值 [英] How to change the last value in each group by reference, in data.table

查看:16
本文介绍了如何在 data.table 中通过引用更改每组中的最后一个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于按站点分组,按时间t排序的data.table DT,我需要更改每个组中变量的最后一个值.我认为应该可以通过引用使用 := 来做到这一点,但我还没有找到一种可行的方法.

For a data.table DT grouped by site, sorted by time t, I need to change the last value of a variable in each group. I assume it should be possible to do this by reference using :=, but I haven't found a way that works yet.

样本数据:

require(data.table)   # using 1.8.11 
DT <- data.table(site=c(rep("A",5), rep("B",4)),t=c(1:5,1:4),a=as.double(c(11:15,21:24)))
setkey(DT, site, t)
DT
#    site t  a
# 1:    A 1 11
# 2:    A 2 12
# 3:    A 3 13
# 4:    A 4 14
# 5:    A 5 15
# 6:    B 1 21
# 7:    B 2 22
# 8:    B 3 23
# 9:    B 4 24

想要的结果是改变每组中a的最后一个值,例如改为999,所以结果如下:

The desired result is to change the last value of a in each group, for example to 999, so the result looks like:

#    site t   a
# 1:    A 1  11
# 2:    A 2  12
# 3:    A 3  13
# 4:    A 4  14
# 5:    A 5 999
# 6:    B 1  21
# 7:    B 2  22
# 8:    B 3  23
# 9:    B 4 999

似乎应该使用 .I 和/或 .N,但我还没有找到有效的形式.在与 .I[.N] 相同的语句中使用 := 会产生错误.以下给出了要进行分配的行号:

It seems like .I and/or .N should be used, but I haven't found a form that works. The use of := in the same statement as .I[.N] gives an error. The following gives me the row numbers where the assignment is to be made:

DT[, .I[.N], by=site]
#    site V1
# 1:    A  5
# 2:    B  9

但我似乎无法将其与 := 分配一起使用.以下给出错误:

but I don't seem to be able to use this with a := assignment. The following give errors:

DT[.N, a:=999, by=site]
# Null data.table (0 rows and 0 cols)

DT[, .I[.N, a:=999], by=site]
# Error in `:=`(a, 999) : 
#   := and `:=`(...) are defined for use in j, once only and in particular ways.
#  See help(":="). Check is.data.table(DT) is TRUE.

DT[.I[.N], a:=999, by=site]
# Null data.table (0 rows and 0 cols)

有没有办法通过 data.table 中的引用来做到这一点?还是在 R 中以另一种方式更好地完成?

Is there a way to do this by reference in data.table? Or is this better done another way in R?

推荐答案

目前可以使用:

DT[DT[, .I[.N], by = site][['V1']], a := 999]
# or, avoiding the overhead of a second call to `[.data.table`
set(DT, i = DT[,.I[.N],by='site'][['V1']], j = 'a', value = 999L)

替代方法:

使用替换...

DT[, a := replace(a, .N, 999), by = site]

或将替换转移到 RHS,由 {} 包装并返回完整向量

or shift the replacement to the RHS, wrapped by {} and return the full vector

DT[, a := {a[.N] <- 999L; a}, by = site]

或使用 mult='last' 并利用 by-without-by.这要求 data.table 由感兴趣的组键入.

or use mult='last' and take advantage of by-without-by. This requires the data.table to be keyed by the groups of interest.

 DT[unique(site), a := 999, mult = 'last']

有一个功能请求#2793 允许

DT[, a[.N] := 999]

但这还没有实现

这篇关于如何在 data.table 中通过引用更改每组中的最后一个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆