按因子对数据框列进行排序 [英] Sort data frame column by factor

查看:181
本文介绍了按因子对数据框列进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个包含3列(nameysex)的数据框,其中name是字符,y是一个数值,而sex是一个因数.

Supose I have a data frame with 3 columns (name, y, sex) where name is character, y is a numeric value and sex is a factor.

sex<-c("M","M","F","M","F","M","M","M","F")
x<-c("MARK","TOM","SUSAN","LARRY","EMMA","LEONARD","TIM","MATT","VIOLET")
name<-as.character(x)
y<-rnorm(9,8,1)
score<-data.frame(x,y,sex)
score
     name      y     sex
1    MARK  6.767086   M
2     TOM  7.613928   M
3   SUSAN  7.447405   F
4   LARRY  8.040069   M
5    EMMA  8.306875   F
6 LEONARD  8.697268   M
7     TIM 10.385221   M
8    MATT  7.497702   M
9  VIOLET 10.177969   F

如果要按y订购,请使用:

If I wanted to order it by y I would use:

score[order(score$y),]
        x         y sex
1    MARK  6.767086   M
3   SUSAN  7.447405   F
8    MATT  7.497702   M
2     TOM  7.613928   M
4   LARRY  8.040069   M
5    EMMA  8.306875   F
6 LEONARD  8.697268   M
9  VIOLET 10.177969   F
7     TIM 10.385221   M

到目前为止,太好了……名称保持正确的分数,但是如何重新排序以使M和F的水平不混合.我需要进行排序,同时将因子级别分开.

So far, so good... The names keep the correct score BUT how could I reorder it to have M and F levels not mixed. I need to order and at the same time keep factor levels separated.

最后,我想进一步涉及字符,该示例无济于事,但是如果绑定了y值,而我又必须在因数内重新排序(例如TIM和TOM分别为8.4和我必须指定字母顺序).

Finally I would like to take a step further to involve character, the example doesn't help, but what if there were tied y values and I would have to order again within factor (e.g. TIM and TOM got 8.4 and I have to assign alphabetical order).

我当时正在按功能考虑,但是它创建了一个列表,并没有帮助.我认为必须有类似的功能才能应用于数据帧并获取数据帧作为返回.

I was thinking about by function but it creates a list and doesn't help really. I think there must be some function like it to apply on data frames and get data frames as return.

明确要点:

sep<-split(score,score$sex)
sep$M<-sep$M[order(sep$M[,2]),]
sep$M
x         y sex
1    MARK  6.767086   M
8    MATT  7.497702   M
2     TOM  7.613928   M
4   LARRY  8.040069   M
6 LEONARD  8.697268   M
7     TIM 10.385221   M

sep$F<-sep$F[order(sep$F[,2]),]
sep$F
x         y sex
3  SUSAN  7.447405   F
5   EMMA  8.306875   F
9 VIOLET 10.177969   F

merged<-rbind(sep$M,sep$F)
merged
x         y sex
1    MARK  6.767086   M
8    MATT  7.497702   M
2     TOM  7.613928   M
4   LARRY  8.040069   M
6 LEONARD  8.697268   M
7     TIM 10.385221   M
3   SUSAN  7.447405   F
5    EMMA  8.306875   F
9  VIOLET 10.177969   F

如果我有2或3个因素,我知道该怎么做.但是,如果我有20个严重的因素水平,应该写一个for循环吗?

I know how to do that if I have 2 or 3 factors. But what if I had serious levels of factors, say 20, should I write a for loop?

推荐答案

order带有多个参数,并且可以满足您的要求:

order takes multiple arguments, and it does just what you want:

with(score, score[order(sex, y, x),])
##         x        y sex
## 3   SUSAN 6.636370   F
## 5    EMMA 6.873445   F
## 9  VIOLET 8.539329   F
## 6 LEONARD 6.082038   M
## 2     TOM 7.812380   M
## 8    MATT 8.248374   M
## 4   LARRY 8.424665   M
## 7     TIM 8.754023   M
## 1    MARK 8.956372   M

这篇关于按因子对数据框列进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆