有条件地选择具有Data.Table的组中的行 [英] Conditionally Select Rows within a Group with Data.Table
问题描述
我正在寻找使用data.table的解决方案―我有一个data.table,其中包含以下列:
I am looking for solutions using data.table ― I have a data.table with the following columns:
data <- data.frame(GROUP=c(3,3,4,4,5,6),
YEAR=c(1979,1985,1999,2011,2012,1994),
NAME=c("S","A","J","L","G","A"))
data <- as.data.table(data)
Data.table:
Data.table:
GROUP YEAR NAME
3 1979 Smith
3 1985 Anderson
4 1999 James
4 2011 Liam
5 2012 George
6 1994 Adams
对于每个组,我们要使用以下规则选择一行:
For each group we want to select one row using the following rule:
- 如果年份> 2000,请选择年份高于 2000的行.
- 如果没有年份> 2000,则选择具有最大年份的行.
所需的输出:
GROUP YEAR NAME
3 1985 Anderson
4 2011 Liam
5 2012 George
6 1994 Adams
谢谢!我已经为此苦苦挣扎了一段时间.
Thanks! I have been struggling with this for a while.
推荐答案
data.table
如果要对特殊的 .I
行计数器进行子集化,应该会简单得多:/p>
data.table
should be a lot simpler if you subset the special .I
row counter:
library(data.table)
setDT(data)
data[
data[
,
if(any(YEAR > 2000))
.I[which.min(2000 - YEAR)] else
.I[which.max(YEAR)],
by=GROUP
]$V1
]
# GROUP YEAR NAME
#1: 3 1985 A
#2: 4 2011 L
#3: 5 2012 G
#4: 6 1994 A
感谢@ r2evans提供背景信息-
Thanks to @r2evans for the background info -
.I
是一个等效于seq_len(nrow(x))
的整数向量.
参考: http://rdrr.io/cran/data.table/man/special-symbols.html
.I
is an integer vector equivalent toseq_len(nrow(x))
.
Ref: http://rdrr.io/cran/data.table/man/special-symbols.html
所以,我在这里要做的就是为每个 by =
级别的每次计算获取整个 data
的匹配行索引.然后使用这些行索引再次对 data
进行子集化.
So, all I'm doing here is getting the matching row index for the whole of data
for each of the calculations at each by=
level. Then using these row indexes to subset data
again.
这篇关于有条件地选择具有Data.Table的组中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!