如何获取Hive中的数组中的前n个元素 [英] How to get first n elements in an array in Hive

查看：8069 发布时间：2018/6/12 14:01:13 hive

本文介绍了如何获取Hive中的数组中的前n个元素的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用split函数在Hive中创建数组，我怎样才能从数组中获得前n个元素，并且我想通过子数组

代码示例

 从表中选择col1 
 where split（col2，'，'）[0：5]

'[0：5]'看起来很喜欢Python风格，但它在这里不起作用。 >

解决方案

这是一个棘手的问题。

首先从这里

然后将它添加到Hive： add jar / path / to / jars / brickhouse- 0.7.0-SNAPSHOT.jar;

现在创建我们将要使用的两个函数：

CREATE TEMPORARY FUNCTION array_index AS'brickhouse.udf.collect.ArrayIndexUDF';

CREATE TEMPORARY FUNCTION numeric_range AS'brickhouse.udf.collect.NumericRange';

查询内容为：

select a， n as array_index， array_index（split（a，'，'），n）as value_from_Array $ b $ from（选择abc＃1，def＃2，hij＃3 a from a double union所有从double中选择abc＃1，def＃2，hij＃3，zzz＃4a）t1 横向视图numeric_range（length（a）-length（regexp_replace（a ，''，''））+ 1）n1 as n

解释：

从双重联合中选择abc＃1，def＃2，hij＃3a 选择abc＃1，def＃2，hij＃3，zzz＃4从双

只是选择一些测试数据，在您的情况下用您的表名替换。

lateral view numeric_range（length（a）-length（regexp_replace（a，'，'，''））+ 1）n1 as n

numeric_range是一个返回给定范围表的UDTF，在这种情况下，我询问了一个范围在0（默认值）和字符串中的元素数作为逗号的数量+ 1）

这样，每一行都会被多重化d由给定列中元素的数量决定。

array_index（split（a，'，'），n）

这与使用 split（a，'，'）[n] 完全相似，但配置单元不支持它。

所以我们得到第一个字符串的第n个元素，以便得到以下结果：

abc＃1，def＃2，hij＃ 3，zzz＃4 0 abc＃1 abc＃1，def＃2，hij＃3，zzz＃4 1 def＃2 abc＃1，def＃2，hij＃3，zzz＃ 4 2 hij＃3 abc＃1，def＃2，hij＃3，zzz＃4 3 zzz＃4 abc＃1，def＃2，hij＃3 0 abc＃1 abc＃1，def＃2，hij＃3 1 def＃2 abc＃1，def＃2，hij＃3 2 hij＃3

如果您确实需要特定数量的元素（例如5），那么只需使用：

横向视图numeric_range（5）n1 as n

I use split function to create an array in Hive, how can I get the first n elements from the array, and I want to go through the sub-array

code example
select col1 from table where split(col2, ',')[0:5]
'[0:5]'looks likes python style, but it doesn't work here.
解决方案
This is a tricky one.
First grab the brickhouse jar from here
Then add it to Hive : add jar /path/to/jars/brickhouse-0.7.0-SNAPSHOT.jar;

Now create the two functions we will be usings :

CREATE TEMPORARY FUNCTION array_index AS 'brickhouse.udf.collect.ArrayIndexUDF';
CREATE TEMPORARY FUNCTION numeric_range AS 'brickhouse.udf.collect.NumericRange';

The query will be :

select a, n as array_index, array_index(split(a,','),n) as value_from_Array from ( select "abc#1,def#2,hij#3" a from dual union all select "abc#1,def#2,hij#3,zzz#4" a from dual) t1 lateral view numeric_range( length(a)-length(regexp_replace(a,',',''))+1 ) n1 as n

Explained :
select "abc#1,def#2,hij#3" a from dual union all select "abc#1,def#2,hij#3,zzz#4" a from dual

Is just selecting some test data, in your case replace this with your table name.

lateral view numeric_range( length(a)-length(regexp_replace(a,',',''))+1 ) n1 as n

numeric_range is a UDTF that returns a table for a given range, in this case, i asked for a range between 0 (default) and the number of elements in string (calculated as the number of commas + 1)
This way, each row will be multiplied by the number of elements in the given column.

array_index(split(a,','),n)

This is exactly like using split(a,',')[n] but hive doesn't support it.
So we get the n-th element for each duplicated row of the initial string resulting in :

abc#1,def#2,hij#3,zzz#4 0 abc#1 abc#1,def#2,hij#3,zzz#4 1 def#2 abc#1,def#2,hij#3,zzz#4 2 hij#3 abc#1,def#2,hij#3,zzz#4 3 zzz#4 abc#1,def#2,hij#3 0 abc#1 abc#1,def#2,hij#3 1 def#2 abc#1,def#2,hij#3 2 hij#3

If you really want a specific number of elements (say 5) then just use :
lateral view numeric_range(5 ) n1 as n

这篇关于如何获取Hive中的数组中的前n个元素的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何获取Hive中的数组中的前n个元素 [英] How to get first n elements in an array in Hive

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何获取Hive中的数组中的前n个元素 [英] How to get first n elements in an array in Hive

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭