按组提取变量最小值对应的行 [英] Extract row corresponding to minimum value of a variable by group
问题描述
我希望(1)按一个变量(State
)对数据进行分组,(2)在每个组内找到另一个变量(Employees
)的最小值行, 和 (3) 提取整行.
I wish to (1) group data by one variable (State
), (2) within each group find the row of minimum value of another variable (Employees
), and (3) extract the entire row.
(1) 和 (2) 是简单的单行,我觉得 (3) 也应该是,但我不明白.
(1) and (2) are easy one-liners, and I feel like (3) should be too, but I can't get it.
这是一个示例数据集:
> data
State Company Employees
1 AK A 82
2 AK B 104
3 AK C 37
4 AK D 24
5 RI E 19
6 RI F 118
7 RI G 88
8 RI H 42
data <- structure(list(State = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L), .Label = c("AK", "RI"), class = "factor"), Company = structure(1:8, .Label = c("A",
"B", "C", "D", "E", "F", "G", "H"), class = "factor"), Employees = c(82L,
104L, 37L, 24L, 19L, 118L, 88L, 42L)), .Names = c("State", "Company",
"Employees"), class = "data.frame", row.names = c(NA, -8L))
按组计算min
很容易,使用aggregate
:
Calculate min
by group is easy, using aggregate
:
> aggregate(Employees ~ State, data, function(x) min(x))
State Employees
1 AK 24
2 RI 19
...或 data.table
:
> library(data.table)
> DT <- data.table(data)
> DT[ , list(Employees = min(Employees)), by = State]
State Employees
1: AK 24
2: RI 19
但是如何提取与这些 min
值相对应的整行,即在结果中还包括 Company
?
But how do I extract the entire row corresponding to these min
values, i.e. also including Company
in the result?
推荐答案
稍微优雅一点:
library(data.table)
DT[ , .SD[which.min(Employees)], by = State]
State Company Employees
1: AK D 24
2: RI E 19
比使用 .SD
稍微不优雅,但要快一些(对于有很多组的数据):
Slighly less elegant than using .SD
, but a bit faster (for data with many groups):
DT[DT[ , .I[which.min(Employees)], by = State]$V1]
另外,如果你的数据集有多个相同的最小值并且你'我想对所有这些进行子集.
Also, just replace the expression which.min(Employees)
with Employees == min(Employees)
, if your data set has multiple identical min values and you'd like to subset all of them.
另见 对应于最大值的子集行使用 data.table 分组的值.
这篇关于按组提取变量最小值对应的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!