使用data.table来标记组中的第一个（或最后一个）记录 [英] using data.table to flag the first (or last) record in a group

查看：132 发布时间：2017/3/12 11:21:53 r data.table

本文介绍了使用data.table来标记组中的第一个（或最后一个）记录的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

给定一个排序键，是否有一个data.table快捷方式来复制在SAS中找到的第一和最后和SPSS？

下面的行人方法标记组的第一个记录。

鉴于data.table（我慢慢熟悉）的优雅，我假设有一个快捷方式使用自加入& mult ，但我还是想弄明白。

以下是示例：

  require（data.table）
 
 set.seed（123）
n<  -  17 
 DT<  -  data.table（x = sample（letters [1：3]，n，replace = ，
y = sample（LETTERS [1：3]，n，replace = T））
 sortkey < -  c（x，y）
 setkeyv 
 key<  -  paste（DT $ x，DT $ y，sep = - ）
 nw < -  c（T，key [2：n]！= key [1： -1）]）
 DT $ first<  -  1 * nw 
 DT

解决方案

这里有几个解决方案使用 data.table ：

  ##选项1（clean solution，added 2016-11-29）
 uDT<  -  unique（DT）
 DT [，c（first，最后）：= 0L] 
 DT [uDT，first：= 1L，mult =first] 
 DT [uDT，last：= 1L，mult =last] 
 b 
 $ b ##选项2（原始答案，留作后代）
 DT < -  cbind（DT，first = 0L，last = 0L）
 DT [DT [ ）,, mult =first，which = TRUE]，first：= 1L] 
 DT [DT [unique（DT）,, mult =last，which = TRUE]，last：= 1L] 
 
头（DT）
＃xy第一最后
＃[1，] a A 1 1 
＃[2，] a B 1 1 
＃ [3，] a C 1 0 
＃[4，] a C 0 1 
＃[5，] b A 1 1 
＃[6，] b B 1 1

这些行显然有很多。但是，关键结构如下，它返回每个组中第一个记录的行索引：

  DT [unique （DT）,, mult =first，which = TRUE] 
＃[1] 1 2 3 5 6 7 11 13 15 
  Given a sortkey, is there a data.table shortcut to duplicate the first and last functionalities found in SAS and SPSS ?

The pedestrian approach below flags the first record of a group. 

Given the elegance of data.table (with which I'm slowly getting familiar), I'm assuming there's a shortcut using a self join & mult, but I'm still trying to figure it out. 

Here's the example:
require(data.table)

set.seed(123)
n <- 17
DT <- data.table(x=sample(letters[1:3],n,replace=T),
                 y=sample(LETTERS[1:3],n,replace=T))
sortkey  <- c("x","y")
setkeyv(DT,sortkey)
key <- paste(DT$x,DT$y,sep="-")
nw <- c( T , key[2:n]!=key[1:(n-1)] )
DT$first <- 1*nw
DT

 解决方案 
Here are couple of solutions using data.table:
## Option 1 (cleaner solution, added 2016-11-29)
uDT <- unique(DT)
DT[, c("first","last"):=0L]
DT[uDT, first:=1L, mult="first"]
DT[uDT, last:=1L, mult="last"]


## Option 2 (original answer, retained for posterity)
DT <- cbind(DT, first=0L, last=0L)
DT[DT[unique(DT),,mult="first", which=TRUE], first:=1L]
DT[DT[unique(DT),,mult="last", which=TRUE], last:=1L]

head(DT)
#      x y first last
# [1,] a A     1    1
# [2,] a B     1    1
# [3,] a C     1    0
# [4,] a C     0    1
# [5,] b A     1    1
# [6,] b B     1    1
There's obviously a lot packed into each of those lines. The key construct, though, is the following, which returns the row index of the first record in each group:
DT[unique(DT),,mult="first", which=TRUE]
# [1]  1  2  3  5  6  7 11 13 15


                        
这篇关于使用data.table来标记组中的第一个（或最后一个）记录的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用data.table来标记组中的第一个（或最后一个）记录 [英] using data.table to flag the first (or last) record in a group

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用data.table来标记组中的第一个（或最后一个）记录 [英] using data.table to flag the first (or last) record in a group

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭