蜂巢-将值范围之间的一行拆分为多行 [英] hive - split a row into multiple rows between the range of values
本文介绍了蜂巢-将值范围之间的一行拆分为多行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在下面有一张桌子,想按从开始到结束的范围对行进行拆分.
I have a table below and would like to split the rows by the range from start to end columns.
,即id和value应该在开始&两次之间重复每个值结束(包括两端)
i.e id and value should repeat for each value between start & end(both inclusive)
--------------------------------------
id | value | start | end
--------------------------------------
1 | 5 | 1 | 4
2 | 8 | 5 | 9
--------------------------------------
所需的输出
--------------------------------------
id | value | current
--------------------------------------
1 | 5 | 1
1 | 5 | 2
1 | 5 | 3
1 | 5 | 4
2 | 8 | 5
2 | 8 | 6
2 | 8 | 7
2 | 8 | 8
2 | 8 | 9
--------------------------------------
我可以在Java/python中编写自己的UDF以得到此结果,但想检查是否可以使用任何现有的Hive UDF在Hive SQL中实现
I can write my own UDF in java/python to get this result but would like to check if I can implement in Hive SQL using any existing hive UDFs
谢谢.
推荐答案
您可以使用posexplode()UDF.
You can do using posexplode() UDF.
WITH
data AS (
SELECT 1 AS id, 5 AS value, 1 AS start, 4 AS `end`
UNION ALL
SELECT 2 AS id, 8 AS value, 5 AS start, 9 AS `end`
)
SELECT distinct id, value, (zr.start+rge.diff) as `current`
FROM data zr LATERAL VIEW posexplode(split(space(zr.`end`-zr.start),' ')) rge as diff, x
这是它的输出:
+-----+--------+----------+--+
| id | value | current |
+-----+--------+----------+--+
| 1 | 5 | 1 |
| 1 | 5 | 2 |
| 1 | 5 | 3 |
| 1 | 5 | 4 |
| 2 | 8 | 5 |
| 2 | 8 | 6 |
| 2 | 8 | 7 |
| 2 | 8 | 8 |
| 2 | 8 | 9 |
+-----+--------+----------+--+
这篇关于蜂巢-将值范围之间的一行拆分为多行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文