如何按多列进行分组然后在Hive中转置 [英] How to group by multiple columns and then transpose in Hive

查看:841
本文介绍了如何按多列进行分组然后在Hive中转置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些数据想要在多列上分组,然后在其上执行聚合函数,然后使用Hive转置到不同列中。

例如,
$ b

输入:



 小时类型值
01 a 10
01 b 20
01 c 50
01 a 30
02 c 10
02 b 90
02 a 80

我想产生这个输出:

输出:



  hr a_avg b_avg c_avg 
01 20 20 50
02 80 90 10

在我的输入中,每个不同的 type 都有一个不同的列。 a_avg 对应于每小时的平均值 a 值。



<我该如何在Hive中做到这一点?我猜我可能需要使用 https://github.com/klout/brickhouse / wiki / Collect-UDFs



到目前为止,我能想到的最好方法是使用多个group-by子句,但不会转置数据进入多个栏目。



有什么想法?

解决方案

不一定需要来使用Brickhouse,但它肯定会使它更容易。这是我在想什么,就像

  select hr 
,type_map ['a'] a_avg
,type_map ['b'] b_avg
,type_map ['c'] c_avg
from(
select hr
,collect(type,avg_value)type_map - Brickhouse collect ;从(
选择hr
,输入
,avg(value)avg_value
from db.table
by hr,type)创建映射
) x
group by hr)y


I have some data that I want to group by on multiple columns, perform an aggregation function on, and then transpose into different columns using Hive.

For example, given this input

Input:

hr  type value
01  a    10
01  b    20
01  c    50
01  a    30
02  c    10
02  b    90
02  a    80

I want to produce this output:

Output:

hr  a_avg  b_avg  c_avg
01  20     20     50
02  80     90     10

Where there is one distinct column for each distinct type in my input. a_avg corresponds to the average a value for each hour.

How can I do this in Hive? I am guessing I might need to make use of https://github.com/klout/brickhouse/wiki/Collect-UDFs

So far the best I can think of is to use multiple group-by clauses, but that won't transpose the data into multiple columns.

Any ideas?

解决方案

You don't necessarily need to use Brickhouse, but it will definitely make it easier. Here is what I'm thinking, something like

select hr
  , type_map['a'] a_avg
  , type_map['b'] b_avg
  , type_map['c'] c_avg
from (
  select hr
    , collect(type, avg_value) type_map -- Brickhouse collect; creates a map
  from (
    select hr
      , type
      , avg( value ) avg_value
    from db.table
    group by hr, type ) x
  group by hr ) y

这篇关于如何按多列进行分组然后在Hive中转置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆