如何在 Pig 中将组展平为单个元组? [英] How to flatten a group into a single tuple in Pig?
问题描述
从这里:
(1, {(1,2), (1,3), (1,4)} )
(2, {(2,5), (2,6), (2,7)} )
...我们怎么能生成这个?
...How could we generate this?
((1,2),(1,3),(1,4))
((2,5),(2,6),(2,7))
...我们怎么能生成这个?
...And how could we generate this?
(1, 2, 3, 4)
(2, 5, 6, 7)
对于单行,我知道该怎么做.问题是当我必须迭代多行并同时操作内部组时.
For a single row I know how to do. The problem is when I have to iterate over many rows AND manipulate internal groups at the same time.
推荐答案
针对你的问题,我准备了以下文件:
For your question, I prepared the following file:
1,2
1,3
1,4
2,5
2,6
2,7
起初,我使用以下脚本来获取您在问题中描述的输入 r3
:
At first, I used the following script to get the input r3
which you described in your question:
r1 = load 'test_file' using PigStorage(',') as (a:int, b:int);
r2 = group r1 by a;
r3 = foreach r2 generate group as a, r1 as b;
describe r3;
-- r3: {a: int,b: {(a: int,b: int)}}
-- r3 is like (1, {(1,2), (1,3), (1,4)} )
如果我们要生成如下内容,
If we want to generate the following content,
(1, 2, 3, 4)
(2, 5, 6, 7)
我们可以使用以下脚本:
we can use the following script:
r4 = foreach r3 generate a, FLATTEN(BagToTuple(b.b));
dump r4;
对于以下内容,
((1,2),(1,3),(1,4))
((2,5),(2,6),(2,7))
我找不到任何有用的内置函数.也许您需要编写自定义的 BagToTuple.这是内置的 BagToTuple 源代码:http://www.grepcode.com/file/repo1.maven.org/maven2/org.apache.pig/pig/0.11.1/org/apache/pig/builtin/BagToTuple.java#BagToTuple.getOuputTupleSize%28org.apache.pig.data.DataBag%29
I can not find any helpful builtin function. Maybe you need to write your custom BagToTuple. Here is the builtin BagToTuple source codes: http://www.grepcode.com/file/repo1.maven.org/maven2/org.apache.pig/pig/0.11.1/org/apache/pig/builtin/BagToTuple.java#BagToTuple.getOuputTupleSize%28org.apache.pig.data.DataBag%29
这篇关于如何在 Pig 中将组展平为单个元组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!