投影猪中的分组元组 [英] Projecting Grouped Tuples in Pig
问题描述
我有一个要在Pig中按b分组的(t,a,b)形式的元组集合.分组后,我想从每个组的元组中过滤出b,并在每个组中生成一袋已过滤的元组.
I have a collection of tuples of the form (t,a,b) that I want to group by b in Pig. Once grouped, I want to filter out b from the tuples in each group and generate a bag of filtered tuples per group.
作为一个例子,假设我们有 (1,2,1) (2,0,1) (3,4,2) (4,1,2) (5,2,3)
As an example, assume we have (1,2,1) (2,0,1) (3,4,2) (4,1,2) (5,2,3)
猪脚本会产生 {(1,2),(2,0)} {(3,4),(4,1)} {(5,2)}
The pig script would produce {(1,2),(2,0)} {(3,4),(4,1)} {(5,2)}
问题是:我该如何产生这个结果?我曾经看过聚合操作遵循分组操作的示例.我不清楚如何过滤元组并将其返回到袋子中.感谢您的协助!
The question is: how do I go about producing this result? I'm used to seeing examples where aggregation operations follow a group by operation. It's less clear to me how to filter the tuples and return them in a bag. Thanks for your assistance!
推荐答案
我发现的是Pig中嵌套投影的语法.
Turns out what I was looking for is the syntax for nested projection in Pig.
如果一个元组具有(t,a,b)形式的元组并且想要将b放在分组依据之后,则可以这样做.
If one has tuples of the form (t,a,b) and wants to drop b after the group by, it is done this way.
grouped = GROUP tups BY b;
result = FOREACH grouped GENERATE tup.(t,a);
请参见PigLatin页面上的嵌套投影"部分. http://wiki.apache.org/pig/PigLatin
See the "Nested Projection" section on the PigLatin page. http://wiki.apache.org/pig/PigLatin
这篇关于投影猪中的分组元组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!