如何在 Hive 中获取数组中的前 n 个元素 [英] How to get first n elements in an array in Hive
问题描述
我在Hive中使用split函数创建了一个数组,如何从数组中获取前n个元素,并且想遍历子数组
I use split function to create an array in Hive, how can I get the first n elements from the array, and I want to go through the sub-array
代码示例
select col1 from table
where split(col2, ',')[0:5]
'[0:5]'看起来像python风格,但在这里行不通.
'[0:5]'looks likes python style, but it doesn't work here.
推荐答案
这是一个棘手的问题.
首先从这里
获取砖房罐子然后将其添加到 Hive 中:add jar/path/to/jars/brickhouse-0.7.0-SNAPSHOT.jar;
现在创建我们将使用的两个函数:
Now create the two functions we will be usings :
CREATE TEMPORARY FUNCTION array_index AS 'brickhouse.udf.collect.ArrayIndexUDF';
CREATE TEMPORARY FUNCTION numeric_range AS 'brickhouse.udf.collect.NumericRange';
查询将是:
选择一个,n 作为数组索引,array_index(split(a,','),n) as value_from_Arrayfrom ( select "abc#1,def#2,hij#3" a from dual union all选择 "abc#1,def#2,hij#3,zzz#4" a from dual) t1侧视图 numeric_range( length(a)-length(regexp_replace(a,',',''))+1 ) n1 as n
解释:select "abc#1,def#2,hij#3" a from dual union all选择 "abc#1,def#2,hij#3,zzz#4" a from dual
只是选择一些测试数据,在您的情况下,将其替换为您的表名.
Is just selecting some test data, in your case replace this with your table name.
横向视图 numeric_range( length(a)-length(regexp_replace(a,',',''))+1 ) n1 as n
numeric_range 是一个 UDTF,它返回给定范围的表,在这种情况下,我要求在 0(默认)和字符串中的元素数(计算为逗号数 + 1)之间的范围
这样,每一行都将乘以给定列中的元素数.
numeric_range is a UDTF that returns a table for a given range, in this case, i asked for a range between 0 (default) and the number of elements in string (calculated as the number of commas + 1)
This way, each row will be multiplied by the number of elements in the given column.
array_index(split(a,','),n)
这与使用 split(a,',')[n]
完全一样,但 hive 不支持它.
所以我们得到初始字符串的每个重复行的第 n 个元素,结果是:
This is exactly like using split(a,',')[n]
but hive doesn't support it.
So we get the n-th element for each duplicated row of the initial string resulting in :
abc#1,def#2,hij#3,zzz#4 0 abc#1abc#1,def#2,hij#3,zzz#4 1 def#2abc#1,def#2,hij#3,zzz#4 2 hij#3abc#1,def#2,hij#3,zzz#4 3 zzz#4abc#1,def#2,hij#3 0 abc#1abc#1,def#2,hij#3 1 def#2abc#1,def#2,hij#3 2 hij#3
如果您真的想要特定数量的元素(比如 5 个),那么只需使用 :横向视图 numeric_range(5 ) n1 as n
If you really want a specific number of elements (say 5) then just use :
lateral view numeric_range(5 ) n1 as n
这篇关于如何在 Hive 中获取数组中的前 n 个元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!