Hive Explode/横向查看多个阵列 [英] Hive Explode / Lateral View multiple arrays
问题描述
我有一个具有以下架构的配置单元表:
I have a hive table with the following schema:
COOKIE | PRODUCT_ID | CAT_ID | QTY
1234123 [1,2,3] [r,t,null] [2,1,null]
如何对数组进行标准化,以便得到以下结果
How can I normalize the arrays so I get the following result
COOKIE | PRODUCT_ID | CAT_ID | QTY
1234123 [1] [r] [2]
1234123 [2] [t] [1]
1234123 [3] null null
我尝试了以下方法:
select concat_ws('|',visid_high,visid_low) as cookie
,pid
,catid
,qty
from table
lateral view explode(productid) ptable as pid
lateral view explode(catalogId) ptable2 as catid
lateral view explode(qty) ptable3 as qty
然而结果是笛卡尔积.
推荐答案
您可以使用 Brickhouse ( http://github.com/klout/brickhouse )来解决这个问题.在 http://brickhouseconfessions.wordpress.com/2013/03/07/exploding-multiple-arrays-at-the-same-time-with-numeric_range/
You can use the numeric_range
and array_index
UDFs from Brickhouse ( http://github.com/klout/brickhouse ) to solve this problem. There is an informative blog posting describing in detail over at http://brickhouseconfessions.wordpress.com/2013/03/07/exploding-multiple-arrays-at-the-same-time-with-numeric_range/
使用这些 UDF,查询将类似于
Using those UDFs, the query would be something like
select cookie,
array_index( product_id_arr, n ) as product_id,
array_index( catalog_id_arr, n ) as catalog_id,
array_index( qty_id_arr, n ) as qty
from table
lateral view numeric_range( size( product_id_arr )) n1 as n;
这篇关于Hive Explode/横向查看多个阵列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!