选择每个组中具有最大值的行 [英] Select the row with the maximum value in each group

查看:74
本文介绍了选择每个组中具有最大值的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在具有针对每个主题的多个观察值的数据集中.对于每个主题,我想选择最大值为"pt"的行.例如,使用以下数据集:

In a dataset with multiple observations for each subject. For each subject I want to select the row which have the maximum value of 'pt'. For example, with a following dataset:

ID    <- c(1,1,1,2,2,2,2,3,3)
Value <- c(2,3,5,2,5,8,17,3,5)
Event <- c(1,1,2,1,2,1,2,2,2)

group <- data.frame(Subject=ID, pt=Value, Event=Event)
#   Subject pt Event
# 1       1  2     1
# 2       1  3     1
# 3       1  5     2 # max 'pt' for Subject 1
# 4       2  2     1
# 5       2  5     2
# 6       2  8     1
# 7       2 17     2 # max 'pt' for Subject 2
# 8       3  3     2
# 9       3  5     2 # max 'pt' for Subject 3

对象1、2和3的最大pt值分别为5、17和5.

Subject 1, 2, and 3 have the biggest pt value of 5, 17, and 5 respectively.

我如何首先找到每个主题的最大pt值,然后将此观察值放在另一个数据框中?结果数据框应仅对每个主题具有最大的pt值.

How could I first find the biggest pt value for each subject, and then, put this observation in another data frame? The resulting data frame should only have the biggest pt values for each subject.

推荐答案

这是一个 data.table 解决方案:

require(data.table) ## 1.9.2
group <- as.data.table(group)

如果要在每个组中保留与 pt 的最大值对应的所有条目:

If you want to keep all the entries corresponding to max values of pt within each group:

group[group[, .I[pt == max(pt)], by=Subject]$V1]
#    Subject pt Event
# 1:       1  5     2
# 2:       2 17     2
# 3:       3  5     2

如果您只想 pt 的第一个最大值:

If you'd like just the first max value of pt:

group[group[, .I[which.max(pt)], by=Subject]$V1]
#    Subject pt Event
# 1:       1  5     2
# 2:       2 17     2
# 3:       3  5     2

在这种情况下,这没有什么区别,因为数据的任何组中都没有多个最大值.

In this case, it doesn't make a difference, as there aren't multiple maximum values within any group in your data.

这篇关于选择每个组中具有最大值的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆