如何在配置单元中转置/旋转数据? [英] How to transpose/pivot data in hive?

查看:23
本文介绍了如何在配置单元中转置/旋转数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道没有直接的方法可以在 hive 中转置数据.我跟着这个问题:Is there a way to transpose data in Hive? ,但由于那里没有最终答案,所以无法一路走来.

I know there's no direct way to transpose data in hive. I followed this question: Is there a way to transpose data in Hive? , but as there is no final answer there, could not get all the way.

这是我的桌子:

 | ID   |   Code   |  Proc1   |   Proc2 | 
 | 1    |    A     |   p      |   e     | 
 | 2    |    B     |   q      |   f     |
 | 3    |    B     |   p      |   f     |
 | 3    |    B     |   q      |   h     |
 | 3    |    B     |   r      |   j     |
 | 3    |    C     |   t      |   k     |

这里 Proc1 可以有任意数量的值.ID、代码和Proc1 一起构成了该表的唯一键.我想转置/转置该表,以便 Proc1 中的每个唯一值成为一个新列,而 Proc2 中的对应值是该列中对应行的值.本质上,我试图得到类似的东西:

Here Proc1 can have any number of values. ID, Code & Proc1 together form a unique key for this table. I want to Pivot/ transpose this table so that each unique value in Proc1 becomes a new column, and corresponding value from Proc2 is the value in that column for the corresponding row. In essense, I'm trying to get something like:

 | ID   |   Code   |  p   |   q |  r  |   t |
 | 1    |    A     |   e  |     |     |     |
 | 2    |    B     |      |   f |     |     |
 | 3    |    B     |   f  |   h |  j  |     |
 | 3    |    C     |      |     |     |  k  |

在新的转换表中,ID 和代码是唯一的主键.根据我上面提到的票证,我可以使用 to_map UDAF 走到这一步.(免责声明 - 这可能不是朝着正确方向迈出的一步,但只是在此处提及,如果是的话)

In the new transformed table, ID and code are the only primary key. From the ticket I mentioned above, I could get this far using the to_map UDAF. (Disclaimer - this may not be a step in the right direction, but just mentioning here, if it is)

 | ID   |   Code   |  Map_Aggregation   | 
 | 1    |    A     |   {p:e}            |
 | 2    |    B     |   {q:f}            |
 | 3    |    B     |   {p:f, q:h, r:j } |  
 | 3    |    C     |   {t:k}            |

但不知道如何从这一步到我想要的数据透视表/转置表.关于如何进行的任何帮助都会很棒!谢谢.

But don't know how to get from this step to the pivot/transposed table I want. Any help on how to proceed will be great! Thanks.

推荐答案

这是我最终使用的解决方案:

Here is the solution I ended up using:

add jar brickhouse-0.7.0-SNAPSHOT.jar;
CREATE TEMPORARY FUNCTION collect AS 'brickhouse.udf.collect.CollectUDAF';

select 
    id, 
    code,
    group_map['p'] as p,
    group_map['q'] as q,
    group_map['r'] as r,
    group_map['t'] as t
    from ( select
        id, code,
        collect(proc1,proc2) as group_map 
        from test_sample 
        group by id, code
    ) gm;

to_map UDF 来自砖厂仓库:https://github.com/klout/brickhouse

The to_map UDF was used from the brickhouse repo: https://github.com/klout/brickhouse

这篇关于如何在配置单元中转置/旋转数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆