如何在蜂巢中转置/转换数据? [英] How to transpose/pivot data in hive?
问题描述
我知道没有直接的方法来在蜂巢中转置数据。我跟着这个问题:有没有办法在Hive中转置数据
I know there's no direct way to transpose data in hive. I followed this question: Is there a way to transpose data in Hive? , but as there is no final answer there, could not get all the way.
这是我拥有的表格:
| ID | Code | Proc1 | Proc2 |
| 1 | A | p | e |
| 2 | B | q | f |
| 3 | B | p | f |
| 3 | B | q | h |
| 3 | B | r | j |
| 3 | C | t | k |
这里Proc1可以有任意数量的值。 ID,Code& Proc1一起构成了此表的唯一键。我想旋转/转置此表,以便Proc1中的每个唯一值成为一个新列,并且来自Proc2的对应值是该列中相应行的值。在本质上,我试图得到这样的东西:
Here Proc1 can have any number of values. ID, Code & Proc1 together form a unique key for this table. I want to Pivot/ transpose this table so that each unique value in Proc1 becomes a new column, and corresponding value from Proc2 is the value in that column for the corresponding row. In essense, I'm trying to get something like:
| ID | Code | p | q | r | t |
| 1 | A | e | | | |
| 2 | B | | f | | |
| 3 | B | f | h | j | |
| 3 | C | | | | k |
在新的转换表中,ID和代码是唯一的主键。从上面提到的票据中,我可以使用to_map UDAF得到这个结果。 (免责声明 - 这可能不是朝正确的方向迈出的一步,但只是在这里提及,如果是这样的话)
In the new transformed table, ID and code are the only primary key. From the ticket I mentioned above, I could get this far using the to_map UDAF. (Disclaimer - this may not be a step in the right direction, but just mentioning here, if it is)
| ID | Code | Map_Aggregation |
| 1 | A | {p:e} |
| 2 | B | {q:f} |
| 3 | B | {p:f, q:h, r:j } |
| 3 | C | {t:k} |
但是不知道如何从这一步转到我想要的透视/转置表。
如何进行的任何帮助都会很棒!
Thanks。
But don't know how to get from this step to the pivot/transposed table I want. Any help on how to proceed will be great! Thanks.
推荐答案
这是我最终使用的解决方案:
Here is the solution I ended up using:
add jar brickhouse-0.7.0-SNAPSHOT.jar;
CREATE TEMPORARY FUNCTION collect AS 'brickhouse.udf.collect.CollectUDAF';
select
id,
code,
group_map['p'] as p,
group_map['q'] as q,
group_map['r'] as r,
group_map['t'] as t
from ( select
id, code,
collect(proc1,proc2) as group_map
from test_sample
group by id, code
) gm;
to_map UDF在brickhouse repo中使用: https://github.com/klout/brickhouse
The to_map UDF was used from the brickhouse repo: https://github.com/klout/brickhouse
这篇关于如何在蜂巢中转置/转换数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!