hadoop pig 加入任何匹配的元组值 [英] hadoop pig joining on any matching tuple values

查看：30 发布时间：2021/11/12 4:19:07 arrays join hadoop apache-pig

本文介绍了hadoop pig 加入任何匹配的元组值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是 Pig 的新手，并试图用它来处理数据集.我有一组看起来像

I'm new to pig and trying to use it to process a dataset. I have a set of records that looks like

id    elements
--------------
1     ["a","b","c"]
2     ["a","f","g"]
3     ["f","g","h"]

这个想法是我想创建具有任何重叠元素的元素元组.如果元素只是一个项目而不是数组，我可以做一个简单的连接:

The idea is that I want to create tuples of elements that have any overlapping elements. If elements was just a single item instead of array, I could do a simple join like:

A = LOAD 'mydata' ...
B = FOREACH A GENERATE id as id_2, elements as elements_2;
C = JOIN A BY elements, B BY elements_2;

但是由于 elements 是一个数组，如果只有部分重叠，这将不起作用.关于如何在猪身上做到这一点的任何想法?

But since elements is an array, this won't work if there is only a partial overlap. Any thoughts on how to do this in pig?

预期输出将给出重叠的元组:

The intended output would give the tuples that have overlap:

(1,2)
(2,3)

hadoop pig 加入任何匹配的元组值 [英] hadoop pig joining on any matching tuple values

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

hadoop pig 加入任何匹配的元组值 [英] hadoop pig joining on any matching tuple values

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭