蜂巢-分层组上的多个(平均)计数差异 [英] Hive - multiple (average) count distincts over layered groups

查看：57 发布时间：2020/11/21 0:34:11 sql hive group-by hiveql

本文介绍了蜂巢-分层组上的多个(平均)计数差异的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

给出以下源数据(假设表名称为user_activity):

Given the following source data (say the table name is user_activity):

+---------+-----------+------------+
| user_id | user_type | some_date  |
+---------+-----------+------------+
| 1       | a         | 2018-01-01 |
| 1       | a         | 2018-01-02 |
| 2       | a         | 2018-01-01 |
| 3       | a         | 2018-01-01 |
| 4       | b         | 2018-01-01 |
| 4       | b         | 2018-01-02 |
| 5       | b         | 2018-01-02 |
+---------+-----------+------------+

我想得到以下结果:

+-----------+------------+---------------------+
| user_type | user_count | average_daily_users |
+-----------+------------+---------------------+
| a         | 3          | 2                   |
| b         | 2          | 1.5                 |
+-----------+------------+---------------------+

使用单个查询，但在同一表上没有多个子查询.

using a single query without multiple subqueries on the same table.

使用多个查询，我可以获得:

Using multiple queries, I can get:

user_count:

select
  user_type,
  count(distinct user_id)
from user_activity
group by user_type

对于average_daily_users:

select
  user_type,
  avg(distinct_users) as average_daily_users
from (
  select
    count(distinct user_id) as distinct_users
  from user_activity
  group by user_type, some_date
)
group by user_type

但是我似乎无法一次性编写出满足我想要的查询.我担心多个子查询对同一个表的性能产生影响(它将不得不对表进行两次扫描...对吗?)我有一个相当大的数据源，并且希望最大程度地减少运行时间.

But I can't seem to write a query that does what I want in one go. I am concerned about the performance impact of multiple subqueries on the same table (it will have to scan through the table twice... right?) I have a rather large data source and would like to minimize running time.

注意:该问题的标题为Hive，因为这是我正在使用的，但是我认为这是一个足够普通的SQL问题，因此我不排除使用其他语言的答案.

注意2:此问题与.

NOTE2: This question shares details with my other question on partition by columns in window functions (for computing the average daily users column).

蜂巢-分层组上的多个(平均)计数差异 [英] Hive - multiple (average) count distincts over layered groups

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

蜂巢-分层组上的多个(平均)计数差异 [英] Hive - multiple (average) count distincts over layered groups

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭