猪-群组的RANK运算 [英] Pig - RANK Operation on Groups

查看:86
本文介绍了猪-群组的RANK运算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Pig的新手,我正在尝试在组内执行RANK操作.我的数据看起来像

I'm new to Pig and I'm trying to perform RANK operation within group.My data looks like



   Name address Date
    A   addr1   20150101
    A   addr2   20150130
    B   addr1   20140325
    B   addr2   20140821
    B   addr3   20150102

我想要这样的输出



    Name    address Date     Rank
    A   addr1   20150101  1
    A   addr2   20150130  2
    B   addr1   20140325  1
    B   addr2   20140821  2
    B   addr3   20150102  3

我正在使用Pig-0.12.1.是否可以通过Pig内置函数以所需格式获取输出?

I'm using Pig-0.12.1.Is there any way to get the output in required format with pig built-in functions ??

推荐答案

使用标准Pig解决这个问题会有点困难,但是在datafu library的帮助下,您可以轻松解决此问题.

It will be little bit difficult to solve this problem using standard pig but with the help of datafu library you can easily solve this problem.

从此链接下载jar文件(datafu-1.2.0.jar) http://mvnrepository.com/artifact/com.linkedin.datafu/datafu/1.2.0 ,将其设置在您的类路径中,然后尝试以下方法

Download the jar file(datafu-1.2.0.jar) from this link http://mvnrepository.com/artifact/com.linkedin.datafu/datafu/1.2.0, set it in your classpath and try the below approach

输入

A       addr1   20150101
A       addr2   20150130
B       addr1   20140325
B       addr2   20140821
B       addr3   20150102

PigScript:

REGISTER /tmp/datafu-1.2.0.jar;
define Enumerate datafu.pig.bags.Enumerate('1');

A = LOAD 'input' USING PigStorage() AS (Name:chararray,Address:chararray,Date:chararray);
B = GROUP A BY Name;
C = FOREACH B GENERATE FLATTEN(Enumerate($1));
DUMP C;

输出:

(A,addr1,20150101,1)
(A,addr2,20150130,2)
(B,addr1,20140325,1)
(B,addr2,20140821,2)
(B,addr3,20150102,3)

这篇关于猪-群组的RANK运算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆