在 Pig 中投影分组元组 [英] Projecting Grouped Tuples in Pig

查看:21
本文介绍了在 Pig 中投影分组元组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 (t,a,b) 形式的元组集合,我想在 Pig 中按 b 对它们进行分组.一旦分组,我想从每组中的元组中过滤出 b 并为每组生成一袋过滤后的元组.

I have a collection of tuples of the form (t,a,b) that I want to group by b in Pig. Once grouped, I want to filter out b from the tuples in each group and generate a bag of filtered tuples per group.

举个例子,假设我们有(1,2,1)(2,0,1)(3,4,2)(4,1,2)(5,2,3)

As an example, assume we have (1,2,1) (2,0,1) (3,4,2) (4,1,2) (5,2,3)

猪脚本会产生{(1,2),(2,0)}{(3,4),(4,1)}{(5,2)}

The pig script would produce {(1,2),(2,0)} {(3,4),(4,1)} {(5,2)}

问题是:我该如何产生这个结果?我习惯于看到聚合操作遵循一组操作的示例.我不太清楚如何过滤元组并将它们放入包中.感谢您的帮助!

The question is: how do I go about producing this result? I'm used to seeing examples where aggregation operations follow a group by operation. It's less clear to me how to filter the tuples and return them in a bag. Thanks for your assistance!

推荐答案

原来我在寻找的是 Pig 中嵌套投影的语法.

Turns out what I was looking for is the syntax for nested projection in Pig.

如果有 (t,a,b) 形式的元组,并且想在 group by 之后删除 b,就这样完成.

If one has tuples of the form (t,a,b) and wants to drop b after the group by, it is done this way.

grouped = GROUP tups BY b;
result = FOREACH grouped GENERATE tup.(t,a);

请参阅 PigLatin 页面上的嵌套投影"部分.http://wiki.apache.org/pig/PigLatin

See the "Nested Projection" section on the PigLatin page. http://wiki.apache.org/pig/PigLatin

这篇关于在 Pig 中投影分组元组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆