按组提取变量最小值对应的行 [英] Extract row corresponding to minimum value of a variable by group

查看:32
本文介绍了按组提取变量最小值对应的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望(1)按一个变量(State)对数据进行分组,(2)在每个组内找到另一个变量(Employees)的最小值行, 和 (3) 提取整行.

I wish to (1) group data by one variable (State), (2) within each group find the row of minimum value of another variable (Employees), and (3) extract the entire row.

(1) 和 (2) 是简单的单行,我觉得 (3) 也应该是,但我不明白.

(1) and (2) are easy one-liners, and I feel like (3) should be too, but I can't get it.

这是一个示例数据集:

> data
  State Company Employees
1    AK       A        82
2    AK       B       104
3    AK       C        37
4    AK       D        24
5    RI       E        19
6    RI       F       118
7    RI       G        88
8    RI       H        42

data <- structure(list(State = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
        2L), .Label = c("AK", "RI"), class = "factor"), Company = structure(1:8, .Label = c("A", 
        "B", "C", "D", "E", "F", "G", "H"), class = "factor"), Employees = c(82L, 
        104L, 37L, 24L, 19L, 118L, 88L, 42L)), .Names = c("State", "Company", 
        "Employees"), class = "data.frame", row.names = c(NA, -8L))

按组计算min很容易,使用aggregate:

Calculate min by group is easy, using aggregate:

> aggregate(Employees ~ State, data, function(x) min(x))
  State Employees
1    AK        24
2    RI        19

...或 data.table:

> library(data.table)
> DT <- data.table(data)
> DT[ , list(Employees = min(Employees)), by = State]
   State Employees
1:    AK        24
2:    RI        19

但是如何提取与这些 min 值相对应的整行,即在结果中还包括 Company ?

But how do I extract the entire row corresponding to these min values, i.e. also including Company in the result?

推荐答案

稍微优雅一点:

library(data.table)
DT[ , .SD[which.min(Employees)], by = State]

   State Company Employees
1:    AK       D        24
2:    RI       E        19


比使用 .SD 稍微不优雅,但要快一些(对于有很多组的数据):


Slighly less elegant than using .SD, but a bit faster (for data with many groups):

DT[DT[ , .I[which.min(Employees)], by = State]$V1]

另外,如果你的数据集有多个相同的最小值并且你'我想对所有这些进行子集.

Also, just replace the expression which.min(Employees) with Employees == min(Employees), if your data set has multiple identical min values and you'd like to subset all of them.

另见 对应于最大值的子集行使用 data.table 分组的值.

这篇关于按组提取变量最小值对应的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆