添加一列排名 [英] Add a column of ranks

查看:42
本文介绍了添加一列排名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些数据:

test <- data.frame(A=c("aaabbb",
"aaaabb",
"aaaabb",
"aaaaab",
"bbbaaa")
)

等等.所有元素都是一样的长度,在我拿到它们之前已经排序了.

and so on. All the elements are the same length, and are already sorted before I get them.

我需要创建一个新的排名列,第一",第二",第三",之后的任何内容都可以留空,并且需要考虑平局.所以在上面的例子中,我想得到以下输出:

I need to make a new column of ranks, "First", "Second", "Third", anything after that can be left blank, and it needs to account for ties. So in the above case, I'd like to get the following output:

   A       B
 aaabbb  First
 aaaabb  Second
 aaaabb  Second
 aaaaab  Third
 bbbaaa
 bbbbaa  

我查看了 rank() 和其他一些使用它的帖子,但我无法让它做我想要的.

I looked at rank() and some other posts that used it, but I wasn't able to get it to do what I was looking for.

推荐答案

这个怎么样:

test$B <- match(test$A , unique(test$A)[1:3] )
test
       A  B
1 aaabbb  1
2 aaaabb  2
3 aaaabb  2
4 aaaaab  3
5 bbbaaa NA
6 bbbbaa NA

执行此操作的多种方法之一.可能不是最好的,但很容易让人想到并且相当直观.您可以使用 unique,因为您收到的是预先排序的数据.

One of many ways to do this. Possibly not the best, but one that readily springs to mind and is fairly intuitive. You can use unique because you receive the data pre-sorted.

当数据被排序时,另一个值得考虑的合适函数是 rle,尽管在​​这个例子中它稍微有点迟钝:

As data is sorted another suitable function worth considering is rle, although it's slightly more obtuse in this example:

rnk <- rle(as.integer(df$A))$lengths
rnk
# [1] 1 2 1 1 1
test$B <- c( rep( 1:3 , times = rnk[1:3] ) , rep(NA, sum( rnk[-c(1:3)] ) ) )

rle 计算向量中相等值的运行的长度(以及我们在这里并不真正关心的值) - 所以这再次起作用,因为您的数据已经排序.

rle computes the lengths (and values which we don't really care about here) of runs of equal values in a vector - so again this works because your data are already sorted.

如果你没有在排名第三的项目之后有空格,那就更简单了(也更易读):

And if you don't have to have blanks after the third ranked item it's even simpler (and more readable):

test$B <- rep(1:length(rnk),times=rnk)

这篇关于添加一列排名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆