按组排名变量(dplyr) [英] Rank variable by group (dplyr)

查看:95
本文介绍了按组排名变量(dplyr)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据列的列为 x1,x2,组,我想生成一个新的数据框,其列为 rank 表示 x1 在其组中的顺序。

I have a dataframe with columns x1, x2, group and I would like to generate a new dataframe with an extra column rank that indicates the order of x1 in its group.

有一个相关问题此处,但是接受的答案似乎不再起作用。

There is a related question here, but the accepted answer does not seem to work anymore.

直到此处,都可以:

library(dplyr)
data(iris)
by_species <- iris %>% 
              arrange(Species, Sepal.Length) %>% 
              group_by(Species)  

但是当我尝试按组获取排名时:

But when I try to get the ranks by group:

by_species <- mutate(by_species, rank=row_number())

错误是:


等级错误(x,ties.method = first,na.last = keep):

参数 x丢失,没有默认值

Error in rank(x, ties.method = "first", na.last = "keep") :
argument "x" is missing, with no default

更新

问题是 dplyr plyr 。要重现该错误,请装入两个软件包:

The problem was some conflict between dplyr and plyr. To reproduce the error, load both packages:

library(dplyr)
library(plyr)
data(iris)
by_species <- iris %>% 
              arrange(Species, Sepal.Length) %>% 
              group_by(Species) %>% 
              mutate(rank=row_number())
# Error in rank(x, ties.method = "first", na.last = "keep") : 
# argument "x" is missing, with no default

卸载 plyr 的工作原理如下:

detach("package:plyr", unload=TRUE)
by_species <- iris %>% 
              arrange(Species, Sepal.Length) %>% 
              group_by(Species) %>% 
              mutate(rank=row_number())

by_species %>% filter(rank <= 3)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species  rank
##          (dbl)       (dbl)        (dbl)       (dbl)     (fctr) (int)
## 1          4.3         3.0          1.1         0.1     setosa     1
## 2          4.4         2.9          1.4         0.2     setosa     2
## 3          4.4         3.0          1.3         0.2     setosa     3
## 4          4.9         2.4          3.3         1.0 versicolor     1
## 5          5.0         2.0          3.5         1.0 versicolor     2
## 6          5.0         2.3          3.3         1.0 versicolor     3
## 7          4.9         2.5          4.5         1.7  virginica     1
## 8          5.6         2.8          4.9         2.0  virginica     2
## 9          5.7         2.5          5.0         2.0  virginica     3


推荐答案

library(dplyr)

by_species <- iris %>% arrange(Species, Sepal.Length) %>%
    group_by(Species) %>% 
    mutate(rank = rank(Sepal.Length, ties.method = "first"))

by_species %>% filter(rank <= 3)
##Source: local data frame [9 x 6]
##Groups: Species [3]
##
##  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species  rank
##         (dbl)       (dbl)        (dbl)       (dbl)     (fctr) (int)
##1          4.3         3.0          1.1         0.1     setosa     1
##2          4.4         2.9          1.4         0.2     setosa     2
##3          4.4         3.0          1.3         0.2     setosa     3
##4          4.9         2.4          3.3         1.0 versicolor     1
##5          5.0         2.0          3.5         1.0 versicolor     2
##6          5.0         2.3          3.3         1.0 versicolor     3
##7          4.9         2.5          4.5         1.7  virginica     1
##8          5.6         2.8          4.9         2.0  virginica     2
##9          5.7         2.5          5.0         2.0  virginica     3

by_species %>% slice(1:3)
##Source: local data frame [9 x 6]
##Groups: Species [3]
##
##  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species  rank
##         (dbl)       (dbl)        (dbl)       (dbl)     (fctr) (int)
##1          4.3         3.0          1.1         0.1     setosa     1
##2          4.4         2.9          1.4         0.2     setosa     2
##3          4.4         3.0          1.3         0.2     setosa     3
##4          4.9         2.4          3.3         1.0 versicolor     1
##5          5.0         2.0          3.5         1.0 versicolor     2
##6          5.0         2.3          3.3         1.0 versicolor     3
##7          4.9         2.5          4.5         1.7  virginica     1
##8          5.6         2.8          4.9         2.0  virginica     2
##9          5.7         2.5          5.0         2.0  virginica     3

这篇关于按组排名变量(dplyr)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆