在dplyr窗口函数中使用多列? [英] Using multiple columns in dplyr window functions?
问题描述
来自SQL,我希望我能够在dplyr中执行以下操作,这可能吗?
Comming from SQL i would expect i was able to do something like the following in dplyr, is this possible?
# R
tbl %>% mutate(n = dense_rank(Name, Email))
-- SQL
SELECT Name, Email, DENSE_RANK() OVER (ORDER BY Name, Email) AS n FROM tbl
还有 PARTITION BY
?
推荐答案
我确实遇到了这个问题,这是我的解决方案:
I did struggle with this problem and here is my solution:
如果找不到支持多个变量排序的函数,建议您使用 paste()<>将它们按优先级从左到右连接。 / code>。
In case you can't find any function which supports ordering by multiple variables, I suggest that you concatenate them by their priority level from left to right using paste()
.
下面是代码示例:
tbl %>%
mutate(n = dense_rank(paste(Name, Email))) %>%
arrange(Name, Email) %>%
view()
此外,我想group_by与SQL中的PARTITION BY等效。
Moreover, I guess group_by is the equivalent for PARTITION BY in SQL.
此解决方案的不足之处在于,您只能按2(或更多)具有相同方向的变量进行排序。如果您需要按方向不同的多个列进行排序,即1个asc和1个desc,建议您尝试以下操作:
基于多个变量的关系计算排名
The shortfall for this solution is that you can only order by 2 (or more) variables which have the same direction. In the case that you need to order by multiple columns which have different direction, saying that 1 asc and 1 desc, I suggest you to try this: Calculate rank with ties based on more than one variable
这篇关于在dplyr窗口函数中使用多列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!