Hive Map-Join配置之谜 [英] Hive Map-Join configuration mystery
问题描述
有人可以清楚地说明两者之间的区别吗
Could someone clearly explain what is the difference between
hive.auto.convert.join
和
hive.auto.convert.join.noconditionaltask
配置参数?
还有这些相应的尺寸参数:
Also these corresponding size parameters:
hive.mapjoin.smalltable.filesize
和
hive.auto.convert.join.noconditionaltask.size
我的观察是,在Tez上运行时,即使将hive.mapjoin.smalltable.filesize
设置为小于小表的大小,将hive.auto.convert.join.noconditionaltask.size
设置为足够高的值时Map-Join仍可工作.
My observation is when running on Tez, Map-Join works when hive.auto.convert.join.noconditionaltask.size
is set to high enough value even when hive.mapjoin.smalltable.filesize
is set less than the size of the small table.
为什么我们都需要
hive.auto.convert.join
和hive.auto.convert.join.noconditionaltask
?
Apache文档非常令人困惑.
推荐答案
这些参数用于决定何时在蜂巢中对Common join
使用Map Join
对Common join
,最终最终影响查询性能.
These parameters are used to make decision on when to use Map Join
against Common join
in hive, which ultimately affects query performance at the end.
Map join
,因此它非常快.这是所有参数的说明:
Map join
is used when one of the join tables is small enough to fit in the memory, so it is very fast. here's the explanation of all parameters:
hive.auto.convert.join
hive.auto.convert.join
当此参数设置为true
时,Hive将自动检查较小的表文件大小是否大于hive.mapjoin.smalltable.filesize
指定的值,如果大于此值,则通过普通联接执行查询.启用自动转换联接后,无需在查询中提供地图联接提示.
When this parameter set to true
, Hive will automatically check if the smaller table file size is bigger than the value specified by hive.mapjoin.smalltable.filesize
, if it's larger than this value then query execute through common join. Once auto convert join is enabled, there is no need to provide the map join hints in the query.
hive.auto.convert.join.noconditionaltask
hive.auto.convert.join.noconditionaltask
当三个或三个以上的表参与联接时,
When three or more tables are involved in join, and
hive.auto.convert.join = true
-Hive假定所有表的大小较小,会生成三个或更多的地图侧联接.
hive.auto.convert.join = true
- Hive generates three or more map-side joins with an assumption that all tables are of smaller size.
hive.auto.convert.join.noconditionaltask = true
,如果n-1表的大小小于10 MB,则配置单元会将三个或更多映射侧联接合并为单个映射侧联接.大小由hive.auto.convert.join.noconditionaltask.size
定义.
hive.auto.convert.join.noconditionaltask = true
, hive will combine three or more map-side joins into a single map-side join if size of n-1 table is less than 10 MB. Here size is defined by hive.auto.convert.join.noconditionaltask.size
.
hive.mapjoin.smalltable.filesize
hive.mapjoin.smalltable.filesize
此设置基本上是告诉优化程序系统中小表定义的方式.此值定义适合您的小表,然后在基于此值执行查询时,它确定联接是否有资格转换为map join
.
This setting basically the way to tell optimizer the definition of small table in your system. This value defines what is small table for you and then when query executes based on this value it determines if join is eligible to convert into map join
.
hive.auto.convert.join.noconditionaltask.size
hive.auto.convert.join.noconditionaltask.size
大小配置使用户可以控制什么大小的表可以容纳在内存中.此值表示可以转换为适合内存的哈希表的表的总和.
The size configuration enables the user to control what size table can fit in memory. This value represents the sum of the sizes of tables that can be converted to hashmaps that fit in memory.
这是一个很好的说明链接,其中包括对所有4个参数的描述,并带有示例:
Here's the very good explanation link which includes description for all 4 parameters with an example:
http://www.openkb.info/2016/01/Difference-between-hivemapjoinsmalltabl.html
这篇关于Hive Map-Join配置之谜的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!