按组提取与变量的最小值对应的行 [英] Extract row corresponding to minimum value of a variable by group
问题描述
我希望(1)通过一个变量( State
)分组数据,(2)在每个组内找到另一个变量的最小值行$ c> Employees ),和(3)提取整行。
(1)
这里是一个示例数据集:
>数据
州公司雇员
1 AK A 82
2 AK B 104
3 AK C 37
4 AK D 24
5 RI E 19
6 RI F 118
7 RI RI 88
8 RI H 42
data <结构(列表(状态=结构(c(1L,1L,1L, 1L,2L,2L,2L,
2L),.Label = c(AK,RI),class =factor),Company = structure(1:8, A,
B,C,D,E,F,G,H),class =factor),Employees = c(82L,
104L,37L,24L,19L,118L,88L,42L)),.Names = c(State,Company,
Employees),class =data.frame .names = c(NA,-8L))
计算 code>:
:<$ p $ c>
>
国家雇员
1 AK 24
2 RI 19
...或 data.table
:
> library(data.table)
> DT < - data.table(data)
> DT [,list(Employees = min(Employees)),by = State]
国家员工
1:AK 24
2:RI 19
但是如何提取对应这些 min
值的整行,也包括公司
在结果中?
稍微更优雅一些:
library(data.table)
DT [,.SD [which.min(Employees)],by = State]
州公司员工
1:AK D 24
2:RI E 19
比使用 .SD
稍微不那么优雅,但更快一点(对于有很多组的数据):
DT [DT [,.I [which.min(Employees)],by = State] $ V1]
此外,只需替换 which.min(Employees)
与 Employees == min code>,如果您的数据集有多个相同的最小值,并且您希望将其全部子集。
另请参见按组与数据表排序。
I wish to (1) group data by one variable (State
), (2) within each group find the row of minimum value of another variable (Employees
), and (3) extract the entire row.
(1) and (2) are easy one-liners, and I feel like (3) should be too, but I can't get it.
Here is a sample data set:
> data
State Company Employees
1 AK A 82
2 AK B 104
3 AK C 37
4 AK D 24
5 RI E 19
6 RI F 118
7 RI G 88
8 RI H 42
data <- structure(list(State = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L), .Label = c("AK", "RI"), class = "factor"), Company = structure(1:8, .Label = c("A",
"B", "C", "D", "E", "F", "G", "H"), class = "factor"), Employees = c(82L,
104L, 37L, 24L, 19L, 118L, 88L, 42L)), .Names = c("State", "Company",
"Employees"), class = "data.frame", row.names = c(NA, -8L))
Calculate min
by group is easy, using aggregate
:
> aggregate(Employees ~ State, data, function(x) min(x))
State Employees
1 AK 24
2 RI 19
...or data.table
:
> library(data.table)
> DT <- data.table(data)
> DT[ , list(Employees = min(Employees)), by = State]
State Employees
1: AK 24
2: RI 19
But how do I extract the entire row corresponding to these min
values, i.e. also including Company
in the result?
Slightly more elegant:
library(data.table)
DT[ , .SD[which.min(Employees)], by = State]
State Company Employees
1: AK D 24
2: RI E 19
Slighly less elegant than using .SD
, but a bit faster (for data with many groups):
DT[DT[ , .I[which.min(Employees)], by = State]$V1]
Also, just replace the expression which.min(Employees)
with Employees == min(Employees)
, if your data set has multiple identical min values and you'd like to subset all of them.
See also Subset by group with data.table.
这篇关于按组提取与变量的最小值对应的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!