按组提取与变量的最小值对应的行 [英] Extract row corresponding to minimum value of a variable by group

查看:122
本文介绍了按组提取与变量的最小值对应的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望(1)通过一个变量( State )分组数据,(2)在每个组内找到另一个变量的最小值行$ c> Employees ),和(3)提取整行。



(1)



这里是一个示例数据集:

 >数据
州公司雇员
1 AK A 82
2 AK B 104
3 AK C 37
4 AK D 24
5 RI E 19
6 RI F 118
7 RI RI 88
8 RI H 42

data <结构(列表(状态=结构(c(1L,1L,1L, 1L,2L,2L,2L,
2L),.Label = c(AK,RI),class =factor),Company = structure(1:8, A,
B,C,D,E,F,G,H),class =factor),Employees = c(82L,
104L,37L,24L,19L,118L,88L,42L)),.Names = c(State,Company,
Employees),class =data.frame .names = c(NA,-8L))

计算 code>::<$ p $ c>



>
国家雇员
1 AK 24
2 RI 19

...或 data.table

 > library(data.table)
> DT < - data.table(data)
> DT [,list(Employees = min(Employees)),by = State]
国家员工
1:AK 24
2:RI 19

但是如何提取对应这些 min 值的整行,也包括公司在结果中?

解决方案

稍微更优雅一些:

  library(data.table)
DT [,.SD [which.min(Employees)],by = State]

州公司员工
1:AK D 24
2:RI E 19






比使用 .SD 稍微不那么优雅,但更快一点(对于有很多组的数据):

  DT [DT [,.I [which.min(Employees)],by = State] $ V1] 

此外,只需替换 which.min(Employees) Employees == min code>,如果您的数据集有多个相同的最小值,并且您希望将其全部子集。



另请参见按组与数据表排序。


I wish to (1) group data by one variable (State), (2) within each group find the row of minimum value of another variable (Employees), and (3) extract the entire row.

(1) and (2) are easy one-liners, and I feel like (3) should be too, but I can't get it.

Here is a sample data set:

> data
  State Company Employees
1    AK       A        82
2    AK       B       104
3    AK       C        37
4    AK       D        24
5    RI       E        19
6    RI       F       118
7    RI       G        88
8    RI       H        42

data <- structure(list(State = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
        2L), .Label = c("AK", "RI"), class = "factor"), Company = structure(1:8, .Label = c("A", 
        "B", "C", "D", "E", "F", "G", "H"), class = "factor"), Employees = c(82L, 
        104L, 37L, 24L, 19L, 118L, 88L, 42L)), .Names = c("State", "Company", 
        "Employees"), class = "data.frame", row.names = c(NA, -8L))

Calculate min by group is easy, using aggregate:

> aggregate(Employees ~ State, data, function(x) min(x))
  State Employees
1    AK        24
2    RI        19

...or data.table:

> library(data.table)
> DT <- data.table(data)
> DT[ , list(Employees = min(Employees)), by = State]
   State Employees
1:    AK        24
2:    RI        19

But how do I extract the entire row corresponding to these min values, i.e. also including Company in the result?

解决方案

Slightly more elegant:

library(data.table)
DT[ , .SD[which.min(Employees)], by = State]

   State Company Employees
1:    AK       D        24
2:    RI       E        19


Slighly less elegant than using .SD, but a bit faster (for data with many groups):

DT[DT[ , .I[which.min(Employees)], by = State]$V1]

Also, just replace the expression which.min(Employees) with Employees == min(Employees), if your data set has multiple identical min values and you'd like to subset all of them.

See also Subset by group with data.table.

这篇关于按组提取与变量的最小值对应的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆