如何避免在Hive查询中重复评估相同的计算列 [英] How to avoid evaluating the same calculated column in Hive query repetedly
问题描述
$ b $ pre $
select str_to_map(k1:1,k2:2,k3: (k1:1,k2:2,k3:3)[k2]作为col1,
str_to_map(k1:1, k2:2,k3:3)[k3]作为col3;
我该如何'修复'列计算一次并在查询中多次访问它的值?正在计算的地图是相同的,只有不同的列访问不同的键。反复执行相同的计算是浪费资源。这个例子故意做得过于简单,但重点是我想知道如何避免Hive中的这种冗余。
在一般情况下,使用子查询,它们会被计算一次。 $ p $ 选择map_col。[k1]作为col1,
map_col。[k2]作为col2,
map_col。[k3]作为col3
从
(
)中选择str_to_map(k1:1,k2:2,k3:3)作为map_col from table ...
)s;
您也可以将一些查询实现到表中,以在不同的查询或工作流中重用数据集。
Lets say I have a calculated column:-
select str_to_map("k1:1,k2:2,k3:3")["k1"] as col1,
str_to_map("k1:1,k2:2,k3:3")["k2"] as col2,
str_to_map("k1:1,k2:2,k3:3")["k3"] as col3;
How do I 'fix' the column calculation only once and access its value multiple times in the query? The map being calculated is the same, only different keys are being accessed for different columns. Performing the same calculation repeatedly is a waste of resources. This example is purposely made too simple, but the point is I want to know how to avoid this kind of redundancy in Hive in general.
In general use subqueries, they are calculated once.
select map_col.["k1"] as col1,
map_col.["k2"] as col2,
map_col.["k3"] as col3
from
(
select str_to_map("k1:1,k2:2,k3:3") as map_col from table...
)s;
Also you can materialize some query into table to reuse the dataset across different queries or workflows.
这篇关于如何避免在Hive查询中重复评估相同的计算列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!