按两列排名并保持联系 [英] Rank by two columns and keep ties

查看:69
本文介绍了按两列排名并保持联系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题是链接

我有一个这样的数据集:

I have a dataset such as this one:

 ID    |     Date 

  A        01/01/2015
  A        02/01/2015
  A        02/01/2015
  A        02/01/2015
  A        05/01/2015     
  B        01/01/2015

我想对每个日期进行排名在推荐日期之前-2015年1月31日。与参考日期最接近的日期排在第1位,第二位,依此类推。结果如下:

I want to rank each date by a referential date - 31/01/2015. The closest date to the referential date being ranked 1, second 2, and so on. The result would look like:

  ID    |     Date           |  Sequence

  A        01/01/2015           3
  A        02/01/2015           2
  A        02/01/2015           2
  A        02/01/2015           2
  A        05/01/2015           1  
  B        01/01/2015          ...

虽然rank函数确实认为,但我也想保持所有联系。我怎么做?

While the rank function does think, I also want to keep all the ties. How do I do that?

此外,我正在处理一个巨大的数据集-大约3亿行。因此,理想的解决方案是快速。

Also, I am working with a huge dataset - approx. 300 million rows. So the solution would ideally be fast.

推荐答案

我们可以使用数据中的 frank .table ,其中密集作为 ties.method ,在 abs 日期与参考日期('2015-01-31')之间的差额

We can use frank from data.table with dense as ties.method after grouping by 'ID' on the absolute difference between the 'Date' and the reference date ('2015-01-31')

library(data.table)
setDT(df)[, Sequence := frank(abs(as.IDate(Date, "%d/%m/%Y")- 
              as.IDate("2015-01-31")), ties.method = "dense"), by = ID]
df
#    ID       Date Sequence
#1:  A 01/01/2015        3
#2:  A 02/01/2015        2
#3:  A 02/01/2015        2
#4:  A 02/01/2015        2
#5:  A 05/01/2015        1
#6:  B 01/01/2015        1



data



data

df <- structure(list(ID = c("A", "A", "A", "A", "A", "B"), Date = c("01/01/2015", 
 "02/01/2015", "02/01/2015", "02/01/2015", "05/01/2015", "01/01/2015"
)), .Names = c("ID", "Date"), class = "data.frame", row.names = c(NA, 
-6L))

这篇关于按两列排名并保持联系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆