选择每组中具有最大值的行 [英] Select the row with the maximum value in each group

查看:65
本文介绍了选择每组中具有最大值的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在每个主题具有多个观察结果的数据集中.对于每个主题,我想选择具有pt"最大值的行.例如,使用以下数据集:

In a dataset with multiple observations for each subject. For each subject I want to select the row which have the maximum value of 'pt'. For example, with a following dataset:

ID    <- c(1,1,1,2,2,2,2,3,3)
Value <- c(2,3,5,2,5,8,17,3,5)
Event <- c(1,1,2,1,2,1,2,2,2)

group <- data.frame(Subject=ID, pt=Value, Event=Event)
#   Subject pt Event
# 1       1  2     1
# 2       1  3     1
# 3       1  5     2 # max 'pt' for Subject 1
# 4       2  2     1
# 5       2  5     2
# 6       2  8     1
# 7       2 17     2 # max 'pt' for Subject 2
# 8       3  3     2
# 9       3  5     2 # max 'pt' for Subject 3

主题 1、2 和 3 的最大 pt 值分别为 5、17 和 5.

Subject 1, 2, and 3 have the biggest pt value of 5, 17, and 5 respectively.

我如何首先找到每个主题的最大 pt 值,然后将这个观察结果放入另一个数据框中?生成的数据框应该只有每个主题的最大 pt 值.

How could I first find the biggest pt value for each subject, and then, put this observation in another data frame? The resulting data frame should only have the biggest pt values for each subject.

推荐答案

这是一个 data.table 解决方案:

Here's a data.table solution:

require(data.table) ## 1.9.2
group <- as.data.table(group)

如果要保留每组内与pt的最大值对应的所有条目:

If you want to keep all the entries corresponding to max values of pt within each group:

group[group[, .I[pt == max(pt)], by=Subject]$V1]
#    Subject pt Event
# 1:       1  5     2
# 2:       2 17     2
# 3:       3  5     2

如果你只想要 pt 的第一个最大值:

If you'd like just the first max value of pt:

group[group[, .I[which.max(pt)], by=Subject]$V1]
#    Subject pt Event
# 1:       1  5     2
# 2:       2 17     2
# 3:       3  5     2

在这种情况下,它没有区别,因为您的数据中的任何组内都没有多个最大值.

In this case, it doesn't make a difference, as there aren't multiple maximum values within any group in your data.

这篇关于选择每组中具有最大值的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆