在 Pig 中合并两行 [英] Merge two lines in Pig

查看:28
本文介绍了在 Pig 中合并两行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想为以下查询编写一个猪脚本.

I would like to write a pig script for below query.

输入是:

ABC,DEF,,
,,GHI,JKL
MNO,PQR,,
,,STU,VWX

输出应该是:

ABC,DEF,GHI,JKL
MNO,PQR,STU,VWX

有人可以帮我吗?

推荐答案

使用原生猪很难解决这个问题.一种选择是下载 datafu-1.2.0.jar 库并尝试以下方法.

It will be difficult to solve this problem using native pig. One option could be download the datafu-1.2.0.jar library and try the below approach.

input.txt

ABC,DEF,,
,,GHI,JKL
MNO,PQR,,
,,STU,VWX

PigScript:

REGISTER /tmp/datafu-1.2.0.jar;
DEFINE BagSplit datafu.pig.bags.BagSplit();

A = LOAD 'input.txt' USING PigStorage(',') AS(f1,f2,f3,f4);
B = GROUP A ALL;
C = FOREACH B GENERATE FLATTEN(BagSplit(2,$1)) AS mybag;
D = FOREACH C GENERATE FLATTEN(STRSPLIT(REPLACE(BagToString(mybag),'_null_null_null_null',''),'_',4));
E = FOREACH D GENERATE $2,$3,$0,$1;
DUMP E;

输出:

(MNO,PQR,STU,VWX)
(ABC,DEF,GHI,JKL)

注意:基于上述输入格式,我的假设将是第一行最后两个列为空,第二行前两个列为空,同样对于第三和第四行也是

Note: Based on the above input format, my assumption will be 1st row last two cols will be null, 2nd row first two cols will be null, similarly for 3rd and 4th row also

这篇关于在 Pig 中合并两行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆