猪拉丁文将列拆分为行 [英] Pig Latin split columns to rows

查看:103
本文介绍了猪拉丁文将列拆分为行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Pig拉丁语中是否有解决方案,可以将列转换为行以获取以下内容?

Is there any solution in Pig latin to transform columns to rows to get the below?

输入:

id|column1|column2
1|a,b,c|1,2,3
2|d,e,f|4,5,6

必需的输出:

id|column1|column2
1|a|1
1|b|2
1|c|3
2|d|4
2|e|5
2|f|6

谢谢

推荐答案

我敢打赌这不是最好的方法……

I'm willing to bet this is not the best way to do this however ...

data = load 'input' using PigStorage('|') as (id:chararray, col1:chararray, 
       col2:chararray);
A = foreach data generate id, flatten(TOKENIZE(col1));
B = foreach data generate id, flatten(TOKENIZE(col2));
RA = RANK A;
RB = RANK B;
store RA into 'ra_temp' using PigStorage(',');
store RB into 'rb_temp' using PigStorage(',');
data_a = load 'ra_temp/part-m-00000' using PigStorage(',');
data_b = load 'rb_temp/part-m-00000' using PigStorage(',');
jed = JOIN data_a BY $0, data_b BY $0;
final = foreach jed generate $1, $2, $5;
dump final;

(1,a,1)
(1,b,2)
(1,c,3)
(2,d,4)
(2,e,5)
(2,f,6)

store final into '~/some_dir' using PigStorage('|');

编辑:我真的很喜欢这个问题,并且正在与同事讨论,他提出了一个更简单,更优雅的解决方案.如果您已安装Jython ...

I really like this question and was discussing it with a co-worker and he came up with a much simpler and more elegant solution. If you have Jython installed ...

#  create file called udf.py

@outputSchema("innerBag:bag{innerTuple:(column1:chararray, column2:chararray)}")
def pigzip(column1, column2):
    c1 = column1.split(',')
    c2 = column2.split(',')
    innerBag = zip(c1, c2)
    return innerBag

然后在Pig

$ pig -x local
register udf.py using jython as udf;
data = load 'input' using PigStorage('|') as (id:chararray, column1:chararray,
       column2:chararray);
result = foreach data generate id, flatten(udf.pigzip(column1, column2));
dump result;
store final into 'output' using PigStorage('|')

这篇关于猪拉丁文将列拆分为行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆