使用data.table来标记组中的第一个(或最后一个)记录 [英] using data.table to flag the first (or last) record in a group
问题描述
给定一个排序键,是否有一个data.table快捷方式来复制在SAS中找到的第一
和最后
和SPSS?
下面的行人方法标记组的第一个记录。
鉴于data.table(我慢慢熟悉)的优雅,我假设有一个快捷方式使用自加入& mult
,但我还是想弄明白。
以下是示例:
require(data.table)
set.seed(123)
n< - 17
DT< - data.table(x = sample(letters [1:3],n,replace = ,
y = sample(LETTERS [1:3],n,replace = T))
sortkey < - c(x,y)
setkeyv
key< - paste(DT $ x,DT $ y,sep = - )
nw < - c(T,key [2:n]!= key [1: -1)])
DT $ first< - 1 * nw
DT
这里有几个解决方案使用 data.table
:
##选项1(clean solution,added 2016-11-29)
uDT< - unique(DT)
DT [,c(first,最后):= 0L]
DT [uDT,first:= 1L,mult =first]
DT [uDT,last:= 1L,mult =last]
b
$ b ##选项2(原始答案,留作后代)
DT < - cbind(DT,first = 0L,last = 0L)
DT [DT [ ),, mult =first,which = TRUE],first:= 1L]
DT [DT [unique(DT),, mult =last,which = TRUE],last:= 1L]
头(DT)
#xy第一最后
#[1,] a A 1 1
#[2,] a B 1 1
# [3,] a C 1 0
#[4,] a C 0 1
#[5,] b A 1 1
#[6,] b B 1 1
这些行显然有很多。但是,关键结构如下,它返回每个组中第一个记录的行索引:
DT [unique (DT),, mult =first,which = TRUE]
#[1] 1 2 3 5 6 7 11 13 15
Given a sortkey, is there a data.table shortcut to duplicate the
first
andlast
functionalities found in SAS and SPSS ?The pedestrian approach below flags the first record of a group.
Given the elegance of data.table (with which I'm slowly getting familiar), I'm assuming there's a shortcut using a self join &
mult
, but I'm still trying to figure it out.Here's the example:
require(data.table) set.seed(123) n <- 17 DT <- data.table(x=sample(letters[1:3],n,replace=T), y=sample(LETTERS[1:3],n,replace=T)) sortkey <- c("x","y") setkeyv(DT,sortkey) key <- paste(DT$x,DT$y,sep="-") nw <- c( T , key[2:n]!=key[1:(n-1)] ) DT$first <- 1*nw DT
解决方案Here are couple of solutions using
data.table
:## Option 1 (cleaner solution, added 2016-11-29) uDT <- unique(DT) DT[, c("first","last"):=0L] DT[uDT, first:=1L, mult="first"] DT[uDT, last:=1L, mult="last"] ## Option 2 (original answer, retained for posterity) DT <- cbind(DT, first=0L, last=0L) DT[DT[unique(DT),,mult="first", which=TRUE], first:=1L] DT[DT[unique(DT),,mult="last", which=TRUE], last:=1L] head(DT) # x y first last # [1,] a A 1 1 # [2,] a B 1 1 # [3,] a C 1 0 # [4,] a C 0 1 # [5,] b A 1 1 # [6,] b B 1 1
There's obviously a lot packed into each of those lines. The key construct, though, is the following, which returns the row index of the first record in each group:
DT[unique(DT),,mult="first", which=TRUE] # [1] 1 2 3 5 6 7 11 13 15
这篇关于使用data.table来标记组中的第一个(或最后一个)记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!