如何在Pig中将一个组拼成一个单独的元组? [英] How to flatten a group into a single tuple in Pig?

查看:88
本文介绍了如何在Pig中将一个组拼成一个单独的元组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由此:

pre $ code>(1,{(1,2),(1,3),(1,4 )})
(2,{(2,5),(2,6),(2,7)})

...我们如何生成这个?



$ $ $ $ $ $ $ $ $ $ $ $((1,2),( 1,3),(1,4))
((2,5),(2,6),(2,7))

...我们怎么能产生这个?

 (1, 2,3,4)
(2,5,6,7)

单排我知道该怎么做。问题是当我必须遍历许多行并同时操作内部组。

解决方案

对于你的问题,我准备了以下文件:

  1,2 
1,3
1,4
2,5
2,6
2,7

起初,我使用以下脚本来获取您在问题中描述的输入 r3

  r1 = load'test_file'使用PigStorage(',')as(a:int,b:int); 
r2 = group r1由a;
r3 = foreach r2生成组为a,r1为b;
描述r3;
- r3:{a:int,b:{(a:int,b:int)}}
- r3类似于(1,{(1,2),(1,3 ),(1,4)})

如果我们想要生成以下内容,

 (1,2,3,4)
(2,5,6,7)

我们可以使用以下脚本:

  r4 = foreach r3生成一个FLATTEN(BagToTuple(bb)); 
dump r4;

对于以下内容,



<$ p $ ((1,2),(1,3),(1,4))
((2,5),(2,6),(2,7))

我找不到任何有用的内建函数。也许你需要编写你的自定义BagToTuple。这里是内置的BagToTuple源代码:http://www.grepcode.com/file/repo1.maven.org/maven2/org.apache.pig/ pig / 0.11.1 / org / apache / pig / builtin / BagToTuple.java#BagToTuple.getOuputTupleSize%28org.apache.pig.data.DataBag%29


From this:

(1, {(1,2), (1,3), (1,4)} )
(2, {(2,5), (2,6), (2,7)} )

...How could we generate this?

((1,2),(1,3),(1,4))
((2,5),(2,6),(2,7))

...And how could we generate this?

(1, 2, 3, 4)
(2, 5, 6, 7)

For a single row I know how to do. The problem is when I have to iterate over many rows AND manipulate internal groups at the same time.

解决方案

For your question, I prepared the following file:

1,2
1,3
1,4
2,5
2,6
2,7

At first, I used the following script to get the input r3 which you described in your question:

r1 = load 'test_file' using PigStorage(',') as (a:int, b:int);
r2 = group r1 by a;
r3 = foreach r2 generate group as a, r1 as b;
describe r3;
-- r3: {a: int,b: {(a: int,b: int)}}
-- r3 is like (1, {(1,2), (1,3), (1,4)} )

If we want to generate the following content,

(1, 2, 3, 4)
(2, 5, 6, 7)

we can use the following script:

r4 = foreach r3 generate a, FLATTEN(BagToTuple(b.b));
dump r4;

For the following content,

((1,2),(1,3),(1,4))
((2,5),(2,6),(2,7))

I can not find any helpful builtin function. Maybe you need to write your custom BagToTuple. Here is the builtin BagToTuple source codes: http://www.grepcode.com/file/repo1.maven.org/maven2/org.apache.pig/pig/0.11.1/org/apache/pig/builtin/BagToTuple.java#BagToTuple.getOuputTupleSize%28org.apache.pig.data.DataBag%29

这篇关于如何在Pig中将一个组拼成一个单独的元组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆