使用 HiveQL 分解结构数组 [英] Exploding Array of Struct using HiveQL

查看:19
本文介绍了使用 HiveQL 分解结构数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

CREATE TABLE IF NOT EXISTS Table2
(
USER_ID BIGINT,
PURCHASED_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>>
) ROW FORMAT
 DELIMITED FIELDS TERMINATED BY '-'
 collection items terminated by ','
 map keys terminated by ':'
 LINES TERMINATED BY '
'
 STORED AS TEXTFILE
 LOCATION '/user/rj/output2';

以下是表2中的数据

1345653-110909316904:1341894546,221065796761:1341887508

我可以通过使用下面的查询来分解上面的数据,它适用于上面的数据-

I can explode the above data by using this below query and it works fine for above data-

SELECT  * FROM (select user_id, prod_and_ts.product_id as product_id,
prod_and_ts.timestamps as timestamps FROM table2 LATERAL VIEW
explode(purchased_item) exploded_table as prod_and_ts) prod_and_ts;

我会得到这样的输出,这很好-

And I will get output like this which is fine-

1345653                                110909316904     1341894546
1345653                                221065796761     1341887508

但在某些情况下,我在下表中有这样的数据,对于相同的 product_id- 用井号附加多个时间戳-

But in some cases I have data in the table below like this, multiple timestamp appended by pound sign for same product_id-

1345653-110909316904:1341894546#1341885695,221065796761:1341887508#1341885453

我需要使用 HiveQL 查询对上述数据进行这样的输出-

And I need output like this for above data using the HiveQL query-

1345653                                110909316904     1341894546
1345653                                110909316904    1341885695
1345653                                221065796761     1341887508
1345653                                221065796761    1341885453

这有可能以某种方式做到这一点吗?

Is this possible to do this somehow?

任何建议将不胜感激.?

Any suggestions will be appreciated.?

P.S 几天前我问过这个问题,但在那种情况下数据不同,现在数据完全不同,我需要类似的输出.

P.S I ask this question few days back, but in that case data was different and now data is totally different and I need similar output.

推荐答案

您可以使用函数 regexp_replace 或 regex_extract 来仅获取产品 ID.试试这个:

You can use the function regexp_replace or regex_extract to get only the product id. Try this:

SELECT  * FROM (select user_id, prod_and_ts.product_id as product_id,
regex_replace(prod_and_ts.timestamps, "#\d*", "")  as timestamps FROM table2 LATERAL VIEW
explode(purchased_item) exploded_table as prod_and_ts) prod_and_ts;

这篇关于使用 HiveQL 分解结构数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆