只保留每个因素级别的最小值 [英] Only keep min value for each factor level

查看:94
本文介绍了只保留每个因素级别的最小值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题,我错了一段时间...希望任何人在这里可以帮助我。

I got a problems that bugs me for some time… hopefully anybody here can help me.

我得到以下数据框架

f <- c('a','a','b','b','b','c','d','d','d','d')
v1 <- c(1.3,10,2,10,10,1.1,10,3.1,10,10)
v2 <- c(1:10)
df <- data.frame(f,v1,v2)



f是一个因素; v1和v2是值。
对于f的每个级别,我只想保留一行:在这个因子级别具有最低值v1的那个。

f is a factor; v1 and v2 are values. For each level of f, I want only want to keep one row: the one that has the lowest value of v1 in this factor level.

f   v1  v2
a   1.3 1
b   2   3
c   1.1 6
d   3.1 8

我尝试了各种各样的东西与聚合,ddply,通过,直接...但似乎没有任何工作。对于任何建议,我将非常感激。

I tried various things with aggregate, ddply, by, tapply… but nothing seems to work. For any suggestions, I would be very thankful.

推荐答案

使用DWin的解决方案,可以避免使用 $ c> ave

Using DWin's solution, tapply can be avoided using ave.

df[ df$v1 == ave(df$v1, df$f, FUN=min), ]

另外加快速度,如下所示。记住你,这也取决于水平的数量。我注意到,我注意到, ave 经常被遗忘,尽管它是R中更强大的功能之一。

This gives another speed-up, as shown below. Mind you, this is also dependent on the number of levels. I give this as I notice that ave is far too often forgotten about, although it is one of the more powerful functions in R.

f <- rep(letters[1:20],10000)
v1 <- rnorm(20*10000)
v2 <- 1:(20*10000)
df <- data.frame(f,v1,v2)

> system.time(df[ df$v1 == ave(df$v1, df$f, FUN=min), ])
   user  system elapsed 
   0.05    0.00    0.05 

> system.time(df[ df$v1 %in% tapply(df$v1, df$f, min), ])
   user  system elapsed 
   0.25    0.03    0.29 

> system.time(lapply(split(df, df$f), FUN = function(x) {
+             vec <- which(x[3] == min(x[3]))
+             return(x[vec, ])
+         })
+  .... [TRUNCATED] 
   user  system elapsed 
   0.56    0.00    0.58 

> system.time(df[tapply(1:nrow(df),df$f,function(i) i[which.min(df$v1[i])]),]
+ )
   user  system elapsed 
   0.17    0.00    0.19 

> system.time( ddply(df, .var = "f", .fun = function(x) {
+     return(subset(x, v1 %in% min(v1)))
+     }
+ )
+ )
   user  system elapsed 
   0.28    0.00    0.28 

这篇关于只保留每个因素级别的最小值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆