Hive爆炸/横向视图多个阵列 [英] Hive Explode / Lateral View multiple arrays

查看:951
本文介绍了Hive爆炸/横向视图多个阵列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有以下模式的配置表:

  COOKIE | PRODUCT_ID | CAT_ID |数量
1234123 [1,2,3] [r,t,null] [2,1,null]

我如何规范化数组,以便得到以下结果

  COOKIE | PRODUCT_ID | CAT_ID |数量

1234123 [1] [r] [2]

1234123 [2] [t] [1]

1234123 [3] null null

我试过以下内容:

 














横向视图爆炸(productid)ptable作为pid
横向视图爆炸(catalogId)ptable2作为catid
横向视图爆炸(qty)ptable3作为数量

然而结果是以笛卡儿积的形式出现的。 您可以使用来自Brickhouse( numeric_range 和 array_index UDF klout / brickhouserel =noreferrer> http://github.com/klout/brickhouse )来解决这个问题。这里有详细的博客文章,详细描述 http://brickhouseconfessions.wordpress.com/2013/03/07/exploding-multiple-arrays-at-the-same-time-with-numeric_range/



使用这些UDF,查询可能类似于

 选择cookie,
array_index(product_id_arr,n)作为product_id,
array_index(catalog_id_arr,n)作为catalog_id,
array_index(qty_id_arr,n)作为数量
从表
横向视图numeric_range(size (product_id_arr))n1 as n;


I have a hive table with the following schema:

COOKIE  | PRODUCT_ID | CAT_ID |    QTY    
1234123   [1,2,3]    [r,t,null]  [2,1,null]

How can I normalize the arrays so I get the following result

COOKIE  | PRODUCT_ID | CAT_ID |    QTY

1234123   [1]          [r]         [2]

1234123   [2]          [t]         [1] 

1234123   [3]          null       null 

I have tried the following:

select concat_ws('|',visid_high,visid_low) as cookie
,pid
,catid 
,qty
from table
lateral view explode(productid) ptable as pid
lateral view explode(catalogId) ptable2 as catid 
lateral view explode(qty) ptable3 as qty

however the result comes out as a Cartesian product.

解决方案

You can use the numeric_range and array_index UDFs from Brickhouse ( http://github.com/klout/brickhouse ) to solve this problem. There is an informative blog posting describing in detail over at http://brickhouseconfessions.wordpress.com/2013/03/07/exploding-multiple-arrays-at-the-same-time-with-numeric_range/

Using those UDFs, the query would be something like

select cookie,
   array_index( product_id_arr, n ) as product_id,
   array_index( catalog_id_arr, n ) as catalog_id,
   array_index( qty_id_arr, n ) as qty
from table
lateral view numeric_range( size( product_id_arr )) n1 as n;

这篇关于Hive爆炸/横向视图多个阵列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆