使用HiveQL爆炸结构数组 [英] Exploding Array of Struct using HiveQL
问题描述
(
USER_ID BIGINT,
PURCHASED_ITEM ARRAY< STRUCT< PRODUCT_ID:BIGINT,TIMESTAMPS:STRING>> $(pre pre $ CREATE TABLE IF NOT EXISTS) b $ b)行格式
DELIMITED FIELDS TERMINATED BY' - '
以''结尾的集合项目','
以''结尾的映射键':'
'\\\
'
存储为TEXTFILE
LOCATION'/ user / rj / output2';
以下是表2中的数据
1345653-110909316904:1341894546,221065796761:1341887508
我可以爆炸上面的数据通过使用下面的查询,它可以很好地处理上面的数据 -
SELECT * FROM(select user_id,prod_and_ts.product_id as product_id,
prod_and_ts.timestamps as timestamps FROM table2 LATERAL VIEW
explode(purchased_item)exploded_table as prod_and_ts)prod_and_ts;
我会得到像这样的输出 -
1345653 110909316904 1341894546
1345653 221065796761 1341887508
但是在某些情况下,我在下面的表格中显示了数据,多个时间戳由英镑符号附加到相同的product_id -
1345653-110909316904:1341894546#1341885695,221065796761:1341887508#1341885453
我需要这样的输出以上数据使用HiveQL查询 -
1345653 110909316904 1341894546
1345653 110909316904 1341885695
1345653 221065796761 1341887508
1345653 221065796761 1341885453
这是可能的吗?
任何建议将不胜感激。
PS我几天前问这个问题,但在这种情况下,数据是不同的,现在数据是完全的不同的,我需要类似的输出。
您可以使用函数regexp_replace或regex_extract来获取产品ID。试试这个:
$ b $ $ p $ SELECT * FROM(select user_id,prod_and_ts.product_id as product_id,
regex_replace(prod_and_ts.timestamps, #\\ d *,)作为时间戳记FROM表2作为prod_and_ts,
CREATE TABLE IF NOT EXISTS Table2
(
USER_ID BIGINT,
PURCHASED_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>>
) ROW FORMAT
DELIMITED FIELDS TERMINATED BY '-'
collection items terminated by ','
map keys terminated by ':'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/rj/output2';
Below is the data in Table2
1345653-110909316904:1341894546,221065796761:1341887508
I can explode the above data by using this below query and it works fine for above data-
SELECT * FROM (select user_id, prod_and_ts.product_id as product_id,
prod_and_ts.timestamps as timestamps FROM table2 LATERAL VIEW
explode(purchased_item) exploded_table as prod_and_ts) prod_and_ts;
And I will get output like this which is fine-
1345653 110909316904 1341894546
1345653 221065796761 1341887508
But in some cases I have data in the table below like this, multiple timestamp appended by pound sign for same product_id-
1345653-110909316904:1341894546#1341885695,221065796761:1341887508#1341885453
And I need output like this for above data using the HiveQL query-
1345653 110909316904 1341894546
1345653 110909316904 1341885695
1345653 221065796761 1341887508
1345653 221065796761 1341885453
Is this possible to do this somehow?
Any suggestions will be appreciated.?
P.S I ask this question few days back, but in that case data was different and now data is totally different and I need similar output.
You can use the function regexp_replace or regex_extract to get only the product id. Try this:
SELECT * FROM (select user_id, prod_and_ts.product_id as product_id,
regex_replace(prod_and_ts.timestamps, "#\\d*", "") as timestamps FROM table2 LATERAL VIEW
explode(purchased_item) exploded_table as prod_and_ts) prod_and_ts;
这篇关于使用HiveQL爆炸结构数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!