使用数据表按不同顺序对多列进行排名 [英] Ranking multiple columns by different orders using data table

查看:40
本文介绍了使用数据表按不同顺序对多列进行排名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在下面的示例中,如何使用不同的顺序对多列进行排名,例如,将y降序排列,将z升序排列?

Using my example below, how can I rank multiple columns using different orders, so for example rank y as descending and z as ascending?

require(data.table)

dt <- data.table(x = c(rep("a", 5), rep("b", 5)),
y = abs(rnorm(10)) * 10, z = abs(rnorm(10)) * 10)

cols <- c("y", "z")

dt[, paste0("rank_", cols) := lapply(.SD, function(x) frankv(x, ties.method = "min")), .SDcols = cols, by = .(x)]

推荐答案

data.table frank()函数具有一些有用的功能,这些功能在基本R的 rank()函数(请参见?frank ).例如,我们可以通过在变量前面加上减号来反转排名顺序:

data.table's frank() function has some useful features which aren't available in base R's rank() function (see ?frank). E.g., we can reverse the order of the ranking by prepending the variable with a minus sign:

library(data.table)
# create reproducible data
set.seed(1L)
dt <- data.table(x = c(rep("a", 5), rep("b", 5)),
                 y = abs(rnorm(10)) * 10, z = abs(rnorm(10)) * 10)
# rank y descending, z ascending
dt[, rank_y := frank(-y), x][, rank_z := frank(z), x][]

    x         y          z rank_y rank_z
 1: a  6.264538 15.1178117      3      4
 2: a  1.836433  3.8984324      5      1
 3: a  8.356286  6.2124058      2      2
 4: a 15.952808 22.1469989      1      5
 5: a  3.295078 11.2493092      4      3
 6: b  8.204684  0.4493361      1      2
 7: b  4.874291  0.1619026      4      1
 8: b  7.383247  9.4383621      2      5
 9: b  5.757814  8.2122120      3      4
10: b  3.053884  5.9390132      5      3

如果有很多列要单独排名,一些列要降序,一些列要升序,我们可以分两步进行

If there are many columns which are to be ranked individually, some descending, some ascending, we can do this in two steps

# first rank all columns in descending order
cols_desc <- c("y")
dt[, paste0("rank_", cols_desc) := lapply(.SD, frankv, ties.method = "min", order = -1L), 
   .SDcols = cols_desc, by = x][]
# then rank all columns in ascending order
cols_asc <- c("z")
dt[, paste0("rank_", cols_asc) := lapply(.SD, frankv, ties.method = "min", order = +1L), 
   .SDcols = cols_asc, by = x][]

    x         y          z rank_y rank_z
 1: a  6.264538 15.1178117      3      4
 2: a  1.836433  3.8984324      5      1
 3: a  8.356286  6.2124058      2      2
 4: a 15.952808 22.1469989      1      5
 5: a  3.295078 11.2493092      4      3
 6: b  8.204684  0.4493361      1      2
 7: b  4.874291  0.1619026      4      1
 8: b  7.383247  9.4383621      2      5
 9: b  5.757814  8.2122120      3      4
10: b  3.053884  5.9390132      5      3

这篇关于使用数据表按不同顺序对多列进行排名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆