如何提取每个组的前n行? [英] How to extract the first n rows per group?

查看:93
本文介绍了如何提取每个组的前n行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个data.table dt 。此data.table首先按 date 列(我的分组变量)排序,然后按 age 列排序:

I have a data.table dt. This data.table is sorted first by column date (my grouping variable), then by column age:

library(data.table)
setkeyv(dt, c("date", "age")) # Sorts table first by column "date" then by "age"
> dt
         date age     name
1: 2000-01-01   3   Andrew
2: 2000-01-01   4      Ben
3: 2000-01-01   5  Charlie
4: 2000-01-02   6     Adam
5: 2000-01-02   7      Bob
6: 2000-01-02   8 Campbell

我的问题是:我想知道是否有可能提取每个唯一日期的前两行吗?或更笼统地说:

My question is: I am wondering if it's possible to extract the first 2 rows for each unique date? Or phrased more generally:

如何提取每个组中的前n行

在此示例中, dt.f 的结果为:

In this example, the result in dt.f would be:

> dt.f = ???????? # function of dt to extract the first 2 rows per unique date
> dt.f
         date age   name
1: 2000-01-01   3 Andrew
2: 2000-01-01   4    Ben
3: 2000-01-02   6   Adam
4: 2000-01-02   7    Bob

ps以下是创建上述data.table的代码:

p.s. Here is the code to create the aforementioned data.table:

install.packages("data.table")
library(data.table)
date <- c("2000-01-01","2000-01-01","2000-01-01",
    "2000-01-02","2000-01-02","2000-01-02")
age <- c(3,4,5,6,7,8)
name <- c("Andrew","Ben","Charlie","Adam","Bob","Campbell")
dt <- data.table(date, age, name)
setkeyv(dt,c("date","age")) # Sorts table first by column "date" then by "age"


推荐答案

是的,只需使用 .SD 并根据需要对其进行索引。

yep, just use .SD and index it as needed.

  DT[, .SD[1:2], by=date]

           date age   name
  1: 2000-01-01   3 Andrew
  2: 2000-01-01   4    Ben
  3: 2000-01-02   6   Adam
  4: 2000-01-02   7    Bob






根据@eddi的建议进行编辑。



@eddi的建议在以下位置出现:


Edited as per @eddi's suggestion.

@eddi's suggestion is spot on:

为了速度,请改用它:

  DT[DT[, .I[1:2], by = date]$V1]

  # using a slightly larger data set
  > microbenchmark(SDstyle=DT[, .SD[1:2], by=date], IStyle=DT[DT[, .I[1:2], by = date]$V1], times=200L)
  Unit: milliseconds
      expr       min        lq    median        uq      max neval
   SDstyle 13.567070 16.224797 22.170302 24.239881 88.26719   200
    IStyle  1.675185  2.018773  2.168818  2.269292 11.31072   200

这篇关于如何提取每个组的前n行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆