猪-群组的RANK运算 [英] Pig - RANK Operation on Groups
问题描述
我是Pig的新手,我正在尝试在组内执行RANK操作.我的数据看起来像
I'm new to Pig and I'm trying to perform RANK operation within group.My data looks like
Name address Date
A addr1 20150101
A addr2 20150130
B addr1 20140325
B addr2 20140821
B addr3 20150102
我想要这样的输出
Name address Date Rank
A addr1 20150101 1
A addr2 20150130 2
B addr1 20140325 1
B addr2 20140821 2
B addr3 20150102 3
我正在使用Pig-0.12.1.是否可以通过Pig内置函数以所需格式获取输出?
I'm using Pig-0.12.1.Is there any way to get the output in required format with pig built-in functions ??
推荐答案
使用标准Pig解决这个问题会有点困难,但是在datafu library
的帮助下,您可以轻松解决此问题.
It will be little bit difficult to solve this problem using standard pig but with the help of datafu library
you can easily solve this problem.
从此链接下载jar文件(datafu-1.2.0.jar
)
http://mvnrepository.com/artifact/com.linkedin.datafu/datafu/1.2.0 ,将其设置在您的类路径中,然后尝试以下方法
Download the jar file(datafu-1.2.0.jar
) from this link
http://mvnrepository.com/artifact/com.linkedin.datafu/datafu/1.2.0, set it in your classpath and try the below approach
输入
A addr1 20150101
A addr2 20150130
B addr1 20140325
B addr2 20140821
B addr3 20150102
PigScript:
REGISTER /tmp/datafu-1.2.0.jar;
define Enumerate datafu.pig.bags.Enumerate('1');
A = LOAD 'input' USING PigStorage() AS (Name:chararray,Address:chararray,Date:chararray);
B = GROUP A BY Name;
C = FOREACH B GENERATE FLATTEN(Enumerate($1));
DUMP C;
输出:
(A,addr1,20150101,1)
(A,addr2,20150130,2)
(B,addr1,20140325,1)
(B,addr2,20140821,2)
(B,addr3,20150102,3)
这篇关于猪-群组的RANK运算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!