如何计算Hive中两个数组的交集和联合? [英] How to compute the intersections and unions of two arrays in Hive?

查看:14412
本文介绍了如何计算Hive中两个数组的交集和联合?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如,交叉点

  select intersect(array(A,B),array(B ,C))

应该返回

  [B] 

/ b>

  select union(array(A,B),array(B,C))

应该返回

  [A,B,C] 

在Hive中做这个?我已经检查过这个配置单元文档,但找不到任何相关信息。

转到 githubLink ,有很多udfs都是由 klout 创建的。下载,打包JAR并在蜂巢中添加JAR。例子

  CREATE TEMPORARY FUNCTION结合AS'brickhouse.udf.collect.CombineUDF'; 
CREATE TEMPORARY FUNCTION combine_unique AS'brickhouse.udf.collect.CombineUniqueUDAF';

从reqtable中选择combine_unique(combine(array('a','b','c'),array('b','c','d')));


[d,b,c,a]


For example, the intersection

select intersect(array("A","B"), array("B","C"))

should return

["B"]

and the union

 select union(array("A","B"), array("B","C"))

should return

["A","B","C"]

What's the best way to make this in Hive? I have checked the hive documentation, but cannot find any relevant information to do this.

解决方案

Your problem solution is here. Go to the githubLink, there is lot of udfs are created by klout. Download, crate the JAR and add the JAR in the hive. Example

 CREATE TEMPORARY FUNCTION combine AS 'brickhouse.udf.collect.CombineUDF';
 CREATE TEMPORARY FUNCTION combine_unique AS 'brickhouse.udf.collect.CombineUniqueUDAF';

select combine_unique(combine(array('a','b','c'), array('b','c','d'))) from reqtable;

OK
["d","b","c","a"]

这篇关于如何计算Hive中两个数组的交集和联合?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆