计数窗口功能中的不重复 [英] Count distinct in window functions

查看:57
本文介绍了计数窗口功能中的不重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图计算每个c的唯一列b,而没有进行分组依据.我知道可以通过join来完成.如何在不诉诸加入的情况下计算(不同的b)(按c划分).为什么窗口函数不支持计数不同.先感谢您.给定以下数据框:

I was trying to count of unique column b for each c, with out doing group by. I know this could be done with join. how to do count(distinct b) over (partition by c) with out resorting to join. Why are count distinct not supported in window functions. Thank you in advance. Given this data frame:

val df= Seq(("a1","b1","c1"),
                ("a2","b2","c1"),
                ("a3","b3","c1"),
                ("a31",null,"c1"),
                ("a32",null,"c1"),
                ("a4","b4","c11"),
                ("a5","b5","c11"),
                ("a6","b6","c11"),
                ("a7","b1","c2"),
                ("a8","b1","c3"),
                ("a9","b1","c4"),
                ("a91","b1","c5"),
                ("a92","b1","c5"),
                ("a93","b1","c5"),
                ("a95","b2","c6"),
                ("a96","b2","c6"),
                ("a97","b1","c6"),
                ("a977",null,"c6"),
                ("a98",null,"c8"),
                ("a99",null,"c8"),
                ("a999",null,"c8")
                ).toDF("a","b","c");

推荐答案

每个c的唯一列b的数量,不进行分组依据.

count of unique column b for each c without doing group by.

典型的SQL解决方法是使用子查询来选择非重复元组,然后在外部查询中选择窗口计数:

A typical SQL workaround is to use a subquery that selects distincts tuples, and then a window count in the outer query:

SELECT c, COUNT(*) OVER(PARTITION BY c) cnt
FROM (SELECT DISTINCT b, c FROM mytable) x

这篇关于计数窗口功能中的不重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆