在“结构"列上进行聚合 [英] Aggregation On Struct columns Hive
问题描述
我有一个struct数组,我试图找到struct列的计数,总和和不同的值.
I have a array of struct and I am trying to find count, sum, distinct values of struct column.
create table temp (regionkey smallint, name string, comment string, nations array<struct<n_nationkey:smallint,n_name:string,n_comment:string>>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
MAP KEYS TERMINATED BY ',';
当我尝试运行查询时
select name,
count(nations.n_nationkey) as count,
sum(nations.n_nationkey) as sum,
ndv(nations.n_nationkey) as distinct_val
from temp
group by name
order by name;
我得到了错误
FAILED: UDFArgumentTypeException Only primitive type arguments are accepted but array<smallint> is passed.
我想做的是找到n_nationkey的计数,总和和不同的值.
What I want to do is find the count, sum and distinct value of n_nationkey.
任何帮助将不胜感激.
推荐答案
select t.name
,count (e.col.n_nationkey) as count
,sum (e.col.n_nationkey) as sum
,count (distinct e.col.n_nationkey) as distinct_val
from temp t lateral view explode (t.nations) e
group by t.name
order by t.name
;
对于OP
具有别名的相同解决方案. nations
不是结构.它是结构的数组.
它没有 n_nationkey
属性.它具有具有 n_nationkey
属性的struct元素. explode
函数采用一个结构体数组( nations
),并在单独的行中返回每个结构体( nations
).
For the OP
The same solution with an alias.
nations
is not a struct. It is an array of structs.
It does not have an n_nationkey
attribute. It has struct elements that have n_nationkey
attributes.
The explode
function takes an array of structs (nations
) and return each struct (nation
) in a separate row.
select t.name
,count (e.nation.n_nationkey) as count
,sum (e.nation.n_nationkey) as sum
,count (distinct e.nation.n_nationkey) as distinct_val
from temp t lateral view explode (t.nations) e as nation
group by t.name
order by t.name
;
这篇关于在“结构"列上进行聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!