包含最大值和因子 [英] Aggregate with max and factors
问题描述
我有一个data.frame,其中包含因子列,我想在该列上计算最大值(或最小值或分位数).我无法在因素上使用这些功能,但我想使用.
I have a data.frame with columns of factors, on which I want to compute a max (or min, or quantiles). I can't use these functions on factors, but I want to.
下面是一些示例:
set.seed(3)
df1 <- data.frame(id = rep(1:5,each=2),height=sample(c("low","medium","high"),size = 10,replace=TRUE))
df1$height <- factor(df1$height,c("low","medium","high"))
df1$height_num <- as.numeric(df1$height)
# > df1
# id height height_num
# 1 1 low 1
# 2 1 high 3
# 3 2 medium 2
# 4 2 low 1
# 5 3 medium 2
# 6 3 medium 2
# 7 4 low 1
# 8 4 low 1
# 9 5 medium 2
# 10 5 medium 2
我可以轻松做到这一点:
I can easily do this:
aggregate(height_num ~ id,df1,max)
# id height_num
# 1 1 3
# 2 2 2
# 3 3 2
# 4 4 1
# 5 5 2
但不是这样:
aggregate(height ~ id,df1,max)
# Error in Summary.factor(c(2L, 2L), na.rm = FALSE) :
# ‘max’ not meaningful for factors
我想采用最大的高度",并在汇总表中保持与原始表相同的级别.在我的真实数据中,我有很多列,并且我希望对因子进行排序,以保持图的清洁和一致.
I want to take the biggest "height", and keep in my aggregated table the same levels as in the original table. In my real data I have many columns and I want to keep my factors sorted to keep my plots clean and consistent.
我可以这样做,并且在其他聚合函数中也使用以下结构:
I can do it this way, and use the following structure in other aggregating functions as well :
use_factors <- function(x,FUN){factor(levels(x)[FUN(as.numeric(x))],levels(x))}
aggregate(height ~ id,df1,use_factors,max)
# id height
# 1 1 high
# 2 2 medium
# 3 3 medium
# 4 4 low
# 5 5 medium
或者我可以重载我认为的max
min
median
和quantile
函数
但是我觉得我肯定在重新发明轮子.
Or I could overload the max
min
median
and quantile
functions I suppose
But I feel I'm surely reinventing the wheel.
有一种简单的方法吗?
推荐答案
实际上,如果使用有序因子,则可以进行所需的聚合.
Actually, you can do the aggregation that you want, if you use an ordered factor.
set.seed(3)
df1 <- data.frame(id = rep(1:5,each=2),height=sample(c("low","medium","high"),size = 10,replace=TRUE))
df1$height <- factor(df1$height,c("low","medium","high"), ordered = TRUE)
df1$height_num <- as.numeric(df1$height)
aggregate(height~id, df1, max)
id height
1 1 high
2 2 medium
3 3 medium
4 4 low
5 5 medium
这篇关于包含最大值和因子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!