使用HiveQL爆炸结构数组 [英] Exploding Array of Struct using HiveQL

查看：238 发布时间：2018/6/12 13:42:10 sql hive hiveql

本文介绍了使用HiveQL爆炸结构数组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

表
（
USER_ID BIGINT，
PURCHASED_ITEM ARRAY< STRUCT< PRODUCT_ID：BIGINT，TIMESTAMPS：STRING>> $（pre pre $ CREATE TABLE IF NOT EXISTS） b $ b）行格式
DELIMITED FIELDS TERMINATED BY' - '
以''结尾的集合项目'，'
以''结尾的映射键'：'
'\\\
'
存储为TEXTFILE
LOCATION'/ user / rj / output2';

以下是表2中的数据

  1345653-110909316904：1341894546,221065796761：1341887508

我可以爆炸上面的数据通过使用下面的查询，它可以很好地处理上面的数据 -

  SELECT * FROM（select user_id，prod_and_ts.product_id as product_id，
 prod_and_ts.timestamps as timestamps FROM table2 LATERAL VIEW 
 explode（purchased_item）exploded_table as prod_and_ts）prod_and_ts;

我会得到像这样的输出 -

  1345653 110909316904 1341894546 
 1345653 221065796761 1341887508

但是在某些情况下，我在下面的表格中显示了数据，多个时间戳由英镑符号附加到相同的product_id -

  1345653-110909316904：1341894546＃1341885695,221065796761：1341887508＃1341885453

我需要这样的输出以上数据使用HiveQL查询 -

  1345653 110909316904 1341894546 
 1345653 110909316904 1341885695 
 1345653 221065796761 1341887508 
 1345653 221065796761 1341885453

这是可能的吗？

任何建议将不胜感激。

PS我几天前问这个问题，但在这种情况下，数据是不同的，现在数据是完全的不同的，我需要类似的输出。

解决方案

您可以使用函数regexp_replace或regex_extract来获取产品ID。试试这个：
$ b $ $ p $ SELECT * FROM（select user_id，prod_and_ts.product_id as product_id， regex_replace（prod_and_ts.timestamps，＃\\ d *，）作为时间戳记FROM表2作为prod_and_ts，

CREATE TABLE IF NOT EXISTS Table2
(
USER_ID BIGINT,
PURCHASED_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>>
) ROW FORMAT
 DELIMITED FIELDS TERMINATED BY '-'
 collection items terminated by ','
 map keys terminated by ':'
 LINES TERMINATED BY '\n'
 STORED AS TEXTFILE
 LOCATION '/user/rj/output2';

Below is the data in Table2

1345653-110909316904:1341894546,221065796761:1341887508

I can explode the above data by using this below query and it works fine for above data-

SELECT  * FROM (select user_id, prod_and_ts.product_id as product_id,
prod_and_ts.timestamps as timestamps FROM table2 LATERAL VIEW
explode(purchased_item) exploded_table as prod_and_ts) prod_and_ts;

And I will get output like this which is fine-

1345653                                110909316904     1341894546
1345653                                221065796761     1341887508

But in some cases I have data in the table below like this, multiple timestamp appended by pound sign for same product_id-

1345653-110909316904:1341894546#1341885695,221065796761:1341887508#1341885453

And I need output like this for above data using the HiveQL query-

1345653                                110909316904     1341894546
1345653                                110909316904    1341885695
1345653                                221065796761     1341887508
1345653                                221065796761    1341885453

Is this possible to do this somehow?

Any suggestions will be appreciated.?

P.S I ask this question few days back, but in that case data was different and now data is totally different and I need similar output.

解决方案

You can use the function regexp_replace or regex_extract to get only the product id. Try this:

SELECT  * FROM (select user_id, prod_and_ts.product_id as product_id,
regex_replace(prod_and_ts.timestamps, "#\\d*", "")  as timestamps FROM table2 LATERAL VIEW
explode(purchased_item) exploded_table as prod_and_ts) prod_and_ts;

这篇关于使用HiveQL爆炸结构数组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用HiveQL爆炸结构数组 [英] Exploding Array of Struct using HiveQL

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用HiveQL爆炸结构数组 [英] Exploding Array of Struct using HiveQL

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭