按两列排名并保持联系 [英] Rank by two columns and keep ties
问题描述
我的问题是链接
我有一个这样的数据集:
I have a dataset such as this one:
ID | Date
A 01/01/2015
A 02/01/2015
A 02/01/2015
A 02/01/2015
A 05/01/2015
B 01/01/2015
我想对每个日期进行排名在推荐日期之前-2015年1月31日。与参考日期最接近的日期排在第1位,第二位,依此类推。结果如下:
I want to rank each date by a referential date - 31/01/2015. The closest date to the referential date being ranked 1, second 2, and so on. The result would look like:
ID | Date | Sequence
A 01/01/2015 3
A 02/01/2015 2
A 02/01/2015 2
A 02/01/2015 2
A 05/01/2015 1
B 01/01/2015 ...
虽然rank函数确实认为,但我也想保持所有联系。我怎么做?
While the rank function does think, I also want to keep all the ties. How do I do that?
此外,我正在处理一个巨大的数据集-大约3亿行。因此,理想的解决方案是快速。
Also, I am working with a huge dataset - approx. 300 million rows. So the solution would ideally be fast.
推荐答案
我们可以使用数据中的
,其中 frank
.table 密集
作为 ties.method
,在 abs
日期与参考日期('2015-01-31')之间的差额
We can use frank
from data.table
with dense
as ties.method
after grouping by 'ID' on the abs
olute difference between the 'Date' and the reference date ('2015-01-31')
library(data.table)
setDT(df)[, Sequence := frank(abs(as.IDate(Date, "%d/%m/%Y")-
as.IDate("2015-01-31")), ties.method = "dense"), by = ID]
df
# ID Date Sequence
#1: A 01/01/2015 3
#2: A 02/01/2015 2
#3: A 02/01/2015 2
#4: A 02/01/2015 2
#5: A 05/01/2015 1
#6: B 01/01/2015 1
data
data
df <- structure(list(ID = c("A", "A", "A", "A", "A", "B"), Date = c("01/01/2015",
"02/01/2015", "02/01/2015", "02/01/2015", "05/01/2015", "01/01/2015"
)), .Names = c("ID", "Date"), class = "data.frame", row.names = c(NA,
-6L))
这篇关于按两列排名并保持联系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!