在dplyr窗口函数中使用多列? [英] Using multiple columns in dplyr window functions?

查看:101
本文介绍了在dplyr窗口函数中使用多列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来自SQL,我希望我能够在dplyr中执行以下操作,这可能吗?

Comming from SQL i would expect i was able to do something like the following in dplyr, is this possible?

# R
tbl %>% mutate(n = dense_rank(Name, Email))

-- SQL
SELECT Name, Email, DENSE_RANK() OVER (ORDER BY Name, Email) AS n FROM tbl

还有 PARTITION BY

推荐答案

我确实遇到了这个问题,这是我的解决方案:

I did struggle with this problem and here is my solution:

如果找不到支持多个变量排序的函数,建议您使用 paste()<>将它们按优先级从左到右连接。 / code>。

In case you can't find any function which supports ordering by multiple variables, I suggest that you concatenate them by their priority level from left to right using paste().

下面是代码示例:

tbl %>%
  mutate(n = dense_rank(paste(Name, Email))) %>%
  arrange(Name, Email) %>%
  view()

此外,我想group_by与SQL中的PARTITION BY等效。

Moreover, I guess group_by is the equivalent for PARTITION BY in SQL.

此解决方案的不足之处在于,您只能按2(或更多)具有相同方向的变量进行排序。如果您需要按方向不同的多个列进行排序,即1个asc和1个desc,建议您尝试以下操作:
基于多个变量的关系计算排名

The shortfall for this solution is that you can only order by 2 (or more) variables which have the same direction. In the case that you need to order by multiple columns which have different direction, saying that 1 asc and 1 desc, I suggest you to try this: Calculate rank with ties based on more than one variable

这篇关于在dplyr窗口函数中使用多列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆