将元组像袋子一样展平 [英] Flatten tuple like a bag

查看：156 发布时间：2018/5/31 19:31:42 hadoop apache-pig flatten

本文介绍了将元组像袋子一样展平的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的数据集如下所示：

（A，（1,2））（B，（2,9））
我想扁平化Pig中的元组，基本上重复每个元组记录在内部元组中找到的每个值，这样预期的输出是：

$ p $ （A，1）（A，2）（B，2）（B，9）
我知道当元组（1,2）和（2,9）是袋子时，这是可能的。
解决方案
您的洞察力不错;可以通过变换袋子中的元组来实现。例如：（A，{（1），（2）}）

这是你的问题的解决方案：

A = LOAD'data.txt'AS（a：chararray，b :( B1：chararray，B2：chararray））; B = FOREACH A GENERATE a，TOBAG（b.b1，b.b2）; C = FOREACH B产生a，FLATTEN（$ 1）;
魔术部分是TOBAG运算符。

My dataset looks like the following:
( A, (1,2) ) ( B, (2,9) )
I would like to "flatten" the tuples in Pig, basically repeating each record for each value found in the inner-tuple, such that the expected output is:
( A, 1 ) ( A, 2 ) ( B, 2 ) ( B, 9 )
I know this is possible when the tuples (1,2) and (2,9) are bags instead.
解决方案
Your insight is good; it's possible by transforming the tuple in a bag. The schema we want to aim for is: {a: chararray,{(chararray)}} for example: (A,{(1),(2)})

Here is the solution to your problem:
A = LOAD 'data.txt' AS (a:chararray,b:(b1:chararray,b2:chararray)); B = FOREACH A GENERATE a, TOBAG(b.b1,b.b2); C = FOREACH B GENERATE a, FLATTEN($1);
The magic part is the TOBAG operator.

这篇关于将元组像袋子一样展平的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将元组像袋子一样展平 [英] Flatten tuple like a bag

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

将元组像袋子一样展平 [英] Flatten tuple like a bag

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭