Pig - 对组的 RANK 操作 [英] Pig - RANK Operation on Groups
问题描述
我是 Pig 的新手,我正在尝试在组内执行 RANK 操作.我的数据看起来像
<前>姓名 地址 日期地址1 20150101地址20150130B 地址 1 20140325B 地址2 20140821B 地址3 20150102我想要这样的输出
<前>姓名地址日期排名地址1 20150101 1地址2 20150130 2B 地址 1 20140325 1B 地址2 20140821 2B 地址 3 20150102 3我正在使用 Pig-0.12.1.有没有什么办法可以使用 pig 内置函数以所需格式获取输出??
使用标准猪来解决这个问题会有点困难,但是有了datafu library
的帮助,你可以轻松解决这个问题.
从此链接下载 jar 文件(datafu-1.2.0.jar
)http://mvnrepository.com/artifact/com.linkedin.datafu/datafu/1.2.0,将其设置在您的类路径中并尝试以下方法
输入
A addr1 20150101地址20150130B 地址 1 20140325B 地址2 20140821B 地址3 20150102
PigScript:
注册/tmp/datafu-1.2.0.jar;定义枚举 datafu.pig.bags.Enumerate('1');A = LOAD 'input' USING PigStorage() AS (Name:chararray,Address:chararray,Date:chararray);B = 按名称分组 A;C = FOREACH B GENERATE FLATTEN(Enumerate($1));转储 C;
输出:
(A,addr1,20150101,1)(A,addr2,20150130,2)(B,addr1,20140325,1)(B,addr2,20140821,2)(B,addr3,20150102,3)
I'm new to Pig and I'm trying to perform RANK operation within group.My data looks like
Name address Date A addr1 20150101 A addr2 20150130 B addr1 20140325 B addr2 20140821 B addr3 20150102
I want my output like this
Name address Date Rank A addr1 20150101 1 A addr2 20150130 2 B addr1 20140325 1 B addr2 20140821 2 B addr3 20150102 3
I'm using Pig-0.12.1.Is there any way to get the output in required format with pig built-in functions ??
It will be little bit difficult to solve this problem using standard pig but with the help of datafu library
you can easily solve this problem.
Download the jar file(datafu-1.2.0.jar
) from this link
http://mvnrepository.com/artifact/com.linkedin.datafu/datafu/1.2.0, set it in your classpath and try the below approach
input
A addr1 20150101
A addr2 20150130
B addr1 20140325
B addr2 20140821
B addr3 20150102
PigScript:
REGISTER /tmp/datafu-1.2.0.jar;
define Enumerate datafu.pig.bags.Enumerate('1');
A = LOAD 'input' USING PigStorage() AS (Name:chararray,Address:chararray,Date:chararray);
B = GROUP A BY Name;
C = FOREACH B GENERATE FLATTEN(Enumerate($1));
DUMP C;
Output:
(A,addr1,20150101,1)
(A,addr2,20150130,2)
(B,addr1,20140325,1)
(B,addr2,20140821,2)
(B,addr3,20150102,3)
这篇关于Pig - 对组的 RANK 操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!