Hive Map-Join配置之谜 [英] Hive Map-Join configuration mystery

查看:237
本文介绍了Hive Map-Join配置之谜的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以清楚地说明两者之间的区别吗

Could someone clearly explain what is the difference between

hive.auto.convert.join

hive.auto.convert.join.noconditionaltask

配置参数?

还有这些相应的尺寸参数:

Also these corresponding size parameters:

hive.mapjoin.smalltable.filesize

hive.auto.convert.join.noconditionaltask.size

我的观察是,在Tez上运行时,即使将hive.mapjoin.smalltable.filesize设置为小于小表的大小,将hive.auto.convert.join.noconditionaltask.size设置为足够高的值时Map-Join仍可工作.

My observation is when running on Tez, Map-Join works when hive.auto.convert.join.noconditionaltask.size is set to high enough value even when hive.mapjoin.smalltable.filesize is set less than the size of the small table.

为什么我们都需要

hive.auto.convert.joinhive.auto.convert.join.noconditionaltask?

Apache文档非常令人困惑.

推荐答案

这些参数用于决定何时在蜂巢中对Common join使用Map JoinCommon join,最终最终影响查询性能.

These parameters are used to make decision on when to use Map Join against Common join in hive, which ultimately affects query performance at the end.

Map join,因此它非常快.这是所有参数的说明:

Map join is used when one of the join tables is small enough to fit in the memory, so it is very fast. here's the explanation of all parameters:

hive.auto.convert.join

hive.auto.convert.join

当此参数设置为true时,Hive将自动检查较小的表文件大小是否大于hive.mapjoin.smalltable.filesize指定的值,如果大于此值,则通过普通联接执行查询.启用自动转换联接后,无需在查询中提供地图联接提示.

When this parameter set to true, Hive will automatically check if the smaller table file size is bigger than the value specified by hive.mapjoin.smalltable.filesize, if it's larger than this value then query execute through common join. Once auto convert join is enabled, there is no need to provide the map join hints in the query.

hive.auto.convert.join.noconditionaltask

hive.auto.convert.join.noconditionaltask

当三个或三个以上的表参与联接时,

When three or more tables are involved in join, and

hive.auto.convert.join = true-Hive假定所有表的大小较小,会生成三个或更多的地图侧联接.

hive.auto.convert.join = true - Hive generates three or more map-side joins with an assumption that all tables are of smaller size.

hive.auto.convert.join.noconditionaltask = true,如果n-1表的大小小于10 MB,则配置单元会将三个或更多映射侧联接合并为单个映射侧联接.大小由hive.auto.convert.join.noconditionaltask.size定义.

hive.auto.convert.join.noconditionaltask = true, hive will combine three or more map-side joins into a single map-side join if size of n-1 table is less than 10 MB. Here size is defined by hive.auto.convert.join.noconditionaltask.size.

hive.mapjoin.smalltable.filesize

hive.mapjoin.smalltable.filesize

此设置基本上是告诉优化程序系统中小表定义的方式.此值定义适合您的小表,然后在基于此值执行查询时,它确定联接是否有资格转换为map join.

This setting basically the way to tell optimizer the definition of small table in your system. This value defines what is small table for you and then when query executes based on this value it determines if join is eligible to convert into map join.

hive.auto.convert.join.noconditionaltask.size

hive.auto.convert.join.noconditionaltask.size

大小配置使用户可以控制什么大小的表可以容纳在内存中.此值表示可以转换为适合内存的哈希表的表的总和.

The size configuration enables the user to control what size table can fit in memory. This value represents the sum of the sizes of tables that can be converted to hashmaps that fit in memory.

这是一个很好的说明链接,其中包括对所有4个参数的描述,并带有示例:

Here's the very good explanation link which includes description for all 4 parameters with an example:

http://www.openkb.info/2016/01/Difference-between-hivemapjoinsmalltabl.html

这篇关于Hive Map-Join配置之谜的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆