按因子对数据框列进行排序 [英] Sort data frame column by factor
问题描述
假设我有一个包含3列(name
,y
,sex
)的数据框,其中name
是字符,y
是一个数值,而sex
是一个因数.
Supose I have a data frame with 3 columns (name
, y
, sex
) where name
is character, y
is a numeric value and sex
is a factor.
sex<-c("M","M","F","M","F","M","M","M","F")
x<-c("MARK","TOM","SUSAN","LARRY","EMMA","LEONARD","TIM","MATT","VIOLET")
name<-as.character(x)
y<-rnorm(9,8,1)
score<-data.frame(x,y,sex)
score
name y sex
1 MARK 6.767086 M
2 TOM 7.613928 M
3 SUSAN 7.447405 F
4 LARRY 8.040069 M
5 EMMA 8.306875 F
6 LEONARD 8.697268 M
7 TIM 10.385221 M
8 MATT 7.497702 M
9 VIOLET 10.177969 F
如果要按y
订购,请使用:
If I wanted to order it by y
I would use:
score[order(score$y),]
x y sex
1 MARK 6.767086 M
3 SUSAN 7.447405 F
8 MATT 7.497702 M
2 TOM 7.613928 M
4 LARRY 8.040069 M
5 EMMA 8.306875 F
6 LEONARD 8.697268 M
9 VIOLET 10.177969 F
7 TIM 10.385221 M
到目前为止,太好了……名称保持正确的分数,但是如何重新排序以使M和F的水平不混合.我需要进行排序,同时将因子级别分开.
So far, so good... The names keep the correct score BUT how could I reorder it to have M and F levels not mixed. I need to order and at the same time keep factor levels separated.
最后,我想进一步涉及字符,该示例无济于事,但是如果绑定了y
值,而我又必须在因数内重新排序(例如TIM和TOM分别为8.4和我必须指定字母顺序).
Finally I would like to take a step further to involve character, the example doesn't help, but what if there were tied y
values and I would have to order again within factor (e.g. TIM and TOM got 8.4 and I have to assign alphabetical order).
我当时正在按功能考虑,但是它创建了一个列表,并没有帮助.我认为必须有类似的功能才能应用于数据帧并获取数据帧作为返回.
I was thinking about by function but it creates a list and doesn't help really. I think there must be some function like it to apply on data frames and get data frames as return.
明确要点:
sep<-split(score,score$sex)
sep$M<-sep$M[order(sep$M[,2]),]
sep$M
x y sex
1 MARK 6.767086 M
8 MATT 7.497702 M
2 TOM 7.613928 M
4 LARRY 8.040069 M
6 LEONARD 8.697268 M
7 TIM 10.385221 M
sep$F<-sep$F[order(sep$F[,2]),]
sep$F
x y sex
3 SUSAN 7.447405 F
5 EMMA 8.306875 F
9 VIOLET 10.177969 F
merged<-rbind(sep$M,sep$F)
merged
x y sex
1 MARK 6.767086 M
8 MATT 7.497702 M
2 TOM 7.613928 M
4 LARRY 8.040069 M
6 LEONARD 8.697268 M
7 TIM 10.385221 M
3 SUSAN 7.447405 F
5 EMMA 8.306875 F
9 VIOLET 10.177969 F
如果我有2或3个因素,我知道该怎么做.但是,如果我有20个严重的因素水平,应该写一个for
循环吗?
I know how to do that if I have 2 or 3 factors. But what if I had serious levels of factors, say 20, should I write a for
loop?
推荐答案
order
带有多个参数,并且可以满足您的要求:
order
takes multiple arguments, and it does just what you want:
with(score, score[order(sex, y, x),])
## x y sex
## 3 SUSAN 6.636370 F
## 5 EMMA 6.873445 F
## 9 VIOLET 8.539329 F
## 6 LEONARD 6.082038 M
## 2 TOM 7.812380 M
## 8 MATT 8.248374 M
## 4 LARRY 8.424665 M
## 7 TIM 8.754023 M
## 1 MARK 8.956372 M
这篇关于按因子对数据框列进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!