Hive Map-Join 配置之谜 [英] Hive Map-Join configuration mystery
问题描述
谁能解释清楚
hive.auto.convert.join
和
hive.auto.convert.join.noconditionaltask
配置参数?
还有这些对应的尺寸参数:
Also these corresponding size parameters:
hive.mapjoin.smalltable.filesize
和
hive.auto.convert.join.noconditionaltask.size
我的观察是在 Tez 上运行时,Map-Join 在 hive.auto.convert.join.noconditionaltask.size
设置为足够高的值时起作用,即使 hive.mapjoin.smalltable.filesize
设置为小于小表的大小.
My observation is when running on Tez, Map-Join works when hive.auto.convert.join.noconditionaltask.size
is set to high enough value even when hive.mapjoin.smalltable.filesize
is set less than the size of the small table.
为什么我们需要两者
hive.auto.convert.join
和 hive.auto.convert.join.noconditionaltask
?
Apache 文档非常混乱.
推荐答案
这些参数用于决定何时在 hive 中使用 Map Join
而不是 Common join
,这最终会影响最后的查询性能.
These parameters are used to make decision on when to use Map Join
against Common join
in hive, which ultimately affects query performance at the end.
Map join
当连接表之一小到足以放入内存时使用,因此速度非常快.这里是所有参数的解释:
Map join
is used when one of the join tables is small enough to fit in the memory, so it is very fast. here's the explanation of all parameters:
hive.auto.convert.join
当这个参数设置为true
时,Hive会自动检查较小的表文件大小是否大于hive.mapjoin.smalltable.filesize
指定的值,如果它大于这个值,然后查询通过公共连接执行.启用自动转换连接后,无需在查询中提供地图连接提示.
When this parameter set to true
, Hive will automatically check if the smaller table file size is bigger than the value specified by hive.mapjoin.smalltable.filesize
, if it's larger than this value then query execute through common join. Once auto convert join is enabled, there is no need to provide the map join hints in the query.
hive.auto.convert.join.noconditionaltask
当join涉及三个或更多表时,和
When three or more tables are involved in join, and
hive.auto.convert.join = true
- Hive 生成三个或更多地图侧连接,并假设所有表的大小都较小.
hive.auto.convert.join = true
- Hive generates three or more map-side joins with an assumption that all tables are of smaller size.
hive.auto.convert.join.noconditionaltask = true
,如果n-1表的大小小于10,hive会将三个或更多map-side join合并成一个map-side joinMB.这里的大小由 hive.auto.convert.join.noconditionaltask.size
定义.
hive.auto.convert.join.noconditionaltask = true
, hive will combine three or more map-side joins into a single map-side join if size of n-1 table is less than 10 MB. Here size is defined by hive.auto.convert.join.noconditionaltask.size
.
hive.mapjoin.smalltable.filesize
这个设置基本上是告诉优化器你系统中小表的定义的方式.这个值定义了什么是小表,然后当查询基于这个值执行时,它确定 join 是否有资格转换为 map join
.
This setting basically the way to tell optimizer the definition of small table in your system. This value defines what is small table for you and then when query executes based on this value it determines if join is eligible to convert into map join
.
hive.auto.convert.join.noconditionaltask.size
大小配置使用户能够控制什么大小的表可以放入内存.该值表示可以转换为适合内存的哈希图的表大小的总和.
The size configuration enables the user to control what size table can fit in memory. This value represents the sum of the sizes of tables that can be converted to hashmaps that fit in memory.
这是一个很好的解释链接,其中包括对所有 4 个参数的描述以及示例:
Here's the very good explanation link which includes description for all 4 parameters with an example:
http://www.openkb.info/2016/01/差异hivemapjoinsmalltabl.html
这篇关于Hive Map-Join 配置之谜的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!