如何使用Hive / Pig / MapReduce来展开递归层次结构 [英] How to flatten recursive hierarchy using Hive/Pig/MapReduce

查看:532
本文介绍了如何使用Hive / Pig / MapReduce来展开递归层次结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以表格格式存储的不平衡树数据,如:

 父,子
a,b
b,c
c,d
c,f
f,g



树的深度是未知的。

以扁平化该层次结构,其中每行包含从一个行中的叶节点到根节点的整个路径,如下所示:

 叶节点,根节点,中间节点
d,a,d:c:b
f,a,e:b

使用配置单元,猪或mapreduce解决上述问题的任何建议?

解决方案

我试图用猪解决它,这里是示例代码:

加入函数

   - 加入父代和子代
定义join_hierarchy(leftA,source,result)返回输出{
joined =连接$ leftA由父左侧,$ source by child;
tmp_filtered =由source :: parent连接的过滤器为null;
part = foreach tmp_filtered leftA :: child as child,leftA :: path as path;
$ result = union部分,$ result;
part_remaining =由source :: parent连接的过滤器不为null;
$ output = foreach part_remaining生成$ leftA :: child作为子项,source :: parent作为父项,concat(concat(source :: parent,':'),$ leftA :: path)
}

加载数据集:

   - 我的数据集字段分隔符是','。 
source = load'*****'使用pigStorage(',')作为(父:chararray,child:chararray);
- 为路径
创建额外的列leftA = foreach source生成子,父,concat(父,':');

- 初始结果表将为空白。
结果=限制leftA 1;
result = foreach结果生成''作为孩子,''作为父母;
- 将层次结构分为4个层次。添加等价于层次深度的以下行。

leftA = join_hierarchy(leftA,source,result);
leftA = join_hierarchy(leftA,source,result);
leftA = join_hierarchy(leftA,source,result);
leftA = join_hierarchy(leftA,source,result);


I have unbalanced tree data stored in tabular format like:

parent,child
a,b
b,c
c,d
c,f
f,g

The depth of tree is unknow.

how to flatten this hierarchy where each row contains entire path from leaf node to root node in a row as:

leaf node, root node, intermediate nodes
d,a,d:c:b
f,a,e:b

Any suggestions to solve above problem using hive, pig or mapreduce? Thanks in advance.

解决方案

I tried to solve it using pig, here are the sample code:

Join function:

-- Join parent and child
Define join_hierarchy ( leftA, source, result) returns output {
    joined= join $leftA by parent left, $source by child;
    tmp_filtered= filter joined by source::parent is null;
    part= foreach tmp_filtered leftA::child as child, leftA::path as path;
    $result= union part, $result;
    part_remaining= filter joined by source::parent is not null;
    $output= foreach part_remaining generate $leftA::child as child, source::parent as parent, concat(concat(source::parent,':'),$leftA::path)
 }

Load dataset:

--My dataset field delimiter is ','.    
source= load '*****' using pigStorage(',') as (parent:chararray, child:chararray);
--create additional column for path
leftA= foreach source generate child, parent, concat(parent,':');  

--initially result table will be blank.
result= limit leftA 1;
result= foreach result generate '' as child , '' as parent;
--Flatten hierarchy to 4 levels. Add below lines equivalent to hierarchy depth.

leftA= join_hierarchy(leftA, source, result);
leftA= join_hierarchy(leftA, source, result);
leftA= join_hierarchy(leftA, source, result);
leftA= join_hierarchy(leftA, source, result);

这篇关于如何使用Hive / Pig / MapReduce来展开递归层次结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆