如何在R中的组内进行排名? [英] How to rank within groups in R?
问题描述
customer_name order_dates order_values
1 John 2010-11- 01 15
2 Bob 2008-03-25 12
3 Alex 2009-11-15 5
4约翰2012-08-06 15
5约翰2015-05-07 20
假设我想添加一个顺序变量,按名称排列最高顺序值订单日期,使用联系断路器的最后订单日期。因此,最终数据应该如下所示:
customer_name order_dates order_values ranking_order_values_by_max_value_date
1 John 2010-11-01 15 3
2 Bob 2008-03-25 12 1
3 Alex 2009-11-15 5 1
4约翰2012-08-06 15 2
5约翰2015-05- 07 20 1
如果每个人的单笔订单都是1,并且所有后续订单都根据价值排列,而决胜者是获得优先权的最后一个订单日期。
在本例中,John的2012年8月6日订单获得了#2排名,因为它在11/1/2010之后发布。 2015年5月7日的订单是1,因为它是最大的。所以,即使是20年前的订单,它应该是排名第一的,因为它是约翰的最高订单价值。
有谁知道我可以在R中做到这一点?我可以在数据框中的一组指定变量中进行排序?
感谢您的帮助!
你可以用 dplyr
library(dplyr)
df%>%
group_by(customer_name)%>%
mutate(my_ranks = order(order(order_values,order_dates,decrease = TRUE) ))
来源:本地数据框[5 x 4]
组:customer_name
customer_name order_dates order_values my_ranks
1 John 2010-11-01 15 3
2 Bob 2008-03-25 12 1
3 Alex 2009-11-15 5 1
4约翰2012-08-06 15 2
5约翰2015-05 -07 20 1
OK, check out this data frame...
customer_name order_dates order_values
1 John 2010-11-01 15
2 Bob 2008-03-25 12
3 Alex 2009-11-15 5
4 John 2012-08-06 15
5 John 2015-05-07 20
Lets say I want to add an order variable that Ranks the highest order value, by name, by max order date, using the last order date at the tie breaker. So, ultimately the data should look like this:
customer_name order_dates order_values ranked_order_values_by_max_value_date
1 John 2010-11-01 15 3
2 Bob 2008-03-25 12 1
3 Alex 2009-11-15 5 1
4 John 2012-08-06 15 2
5 John 2015-05-07 20 1
Where everyone's single order gets 1, and all subsequent orders are ranked based on the value, and the tie breaker is the last order date getting priority. In this example, John's 8/6/2012 order gets the #2 rank because it was placed after 11/1/2010. The 5/7/2015 order is 1 because it was the biggest. So, even if that order was placed 20 years ago, it should be the #1 Rank because it was John's highest order value.
Does anyone know how I can do this in R? Where I can Rank within a group of specified variables in a data frame?
Thanks for your help!
You can do this pretty cleanly with dplyr
library(dplyr)
df %>%
group_by(customer_name) %>%
mutate(my_ranks = order(order(order_values, order_dates, decreasing=TRUE)))
Source: local data frame [5 x 4]
Groups: customer_name
customer_name order_dates order_values my_ranks
1 John 2010-11-01 15 3
2 Bob 2008-03-25 12 1
3 Alex 2009-11-15 5 1
4 John 2012-08-06 15 2
5 John 2015-05-07 20 1
这篇关于如何在R中的组内进行排名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!