在猪中旋转 [英] Pivoting in Pig

查看:27
本文介绍了在猪中旋转的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这与使用 Apache Pig 的数据透视表中的问题有关.我的输入数据为

This is related to the question in Pivot table with Apache Pig. I have the input data as

Id    Name     Value 
1     Column1  Row11 
1     Column2  Row12 
1     Column3  Row13 
2     Column1  Row21 
2     Column2  Row22 
2     Column3  Row23 

并希望旋转并获得输出为

and want to pivot and get the output as

Id    Column1 Column2 Column3 
1      Row11    Row12   Row13 
2      Row21    Row22   Row23 

请让我知道如何在 Pig 中做到这一点.

Pls let me know how to do it in Pig.

推荐答案

在没有 UDF 的情况下,最简单的方法是在 Id 上分组,而不是在嵌套的 foreach 中为每个列名选择行,然后将它们加入生成.见脚本:

The simplest way to do it without UDF is to group on Id and than in nested foreach select rows for each of the column names, then join them in the generate. See script:

inpt = load '~/rows_to_cols.txt' as (Id : chararray, Name : chararray, Value: chararray);
grp = group inpt by Id;
maps = foreach grp {
    col1 = filter inpt by Name == 'Column1';
    col2 = filter inpt by Name == 'Column2';
    col3 = filter inpt by Name == 'Column3';
    generate flatten(group) as Id, flatten(col1.Value) as Column1, flatten(col2.Value)  as Column2, flatten(col3.Value)  as Column3;
};

输出:

(1,Row11,Row12,Row13)
(2,Row21,Row22,Row23)

另一种选择是编写一个 UDF,将 bag{name, value} 转换为 map[],而不是通过使用列名作为键来获取值(例如 vals#'Column1').

Another option would be to write a UDF which converts a bag{name, value} into a map[], than use get values by using column names as keys (Ex. vals#'Column1').

这篇关于在猪中旋转的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆