如何避免在Hive查询中重复评估相同的计算列 [英] How to avoid evaluating the same calculated column in Hive query repetedly

查看:827
本文介绍了如何避免在Hive查询中重复评估相同的计算列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


$ b $ pre $ select str_to_map(k1:1,k2:2,k3: (k1:1,k2:2,k3:3)[k2]作为col1,
str_to_map(k1:1, k2:2,k3:3)[k3]作为col3;

我该如何'修复'列计算一次并在查询中多次访问它的值?正在计算的地图是相同的,只有不同的列访问不同的键。反复执行相同的计算是浪费资源。这个例子故意做得过于简单,但重点是我想知道如何避免Hive中的这种冗余。

解决方案

在一般情况下,使用子查询,它们会被计算一次。 $ p $ 选择map_col。[k1]作为col1,
map_col。[k2]作为col2,
map_col。[k3]作为col3


)中选择str_to_map(k1:1,k2:2,k3:3)作为map_col from table ...
)s;

您也可以将一些查询实现到表中,以在不同的查询或工作流中重用数据集。

Lets say I have a calculated column:-

select str_to_map("k1:1,k2:2,k3:3")["k1"] as col1,
       str_to_map("k1:1,k2:2,k3:3")["k2"] as col2,
       str_to_map("k1:1,k2:2,k3:3")["k3"] as col3;

How do I 'fix' the column calculation only once and access its value multiple times in the query? The map being calculated is the same, only different keys are being accessed for different columns. Performing the same calculation repeatedly is a waste of resources. This example is purposely made too simple, but the point is I want to know how to avoid this kind of redundancy in Hive in general.

解决方案

In general use subqueries, they are calculated once.

select map_col.["k1"] as col1, 
       map_col.["k2"] as col2,
       map_col.["k3"] as col3
from 
(
 select str_to_map("k1:1,k2:2,k3:3") as map_col from table... 
)s;

Also you can materialize some query into table to reuse the dataset across different queries or workflows.

这篇关于如何避免在Hive查询中重复评估相同的计算列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆