Hive Map-Join 配置之谜 [英] Hive Map-Join configuration mystery

查看:9
本文介绍了Hive Map-Join 配置之谜的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

谁能解释清楚

hive.auto.convert.join

hive.auto.convert.join.noconditionaltask

配置参数?

还有这些对应的尺寸参数:

Also these corresponding size parameters:

hive.mapjoin.smalltable.filesize

hive.auto.convert.join.noconditionaltask.size

我的观察是在 Tez 上运行时,Map-Join 在 hive.auto.convert.join.noconditionaltask.size 设置为足够高的值时起作用,即使 hive.mapjoin.smalltable.filesize 设置为小于小表的大小.

My observation is when running on Tez, Map-Join works when hive.auto.convert.join.noconditionaltask.size is set to high enough value even when hive.mapjoin.smalltable.filesize is set less than the size of the small table.

为什么我们需要两者

hive.auto.convert.joinhive.auto.convert.join.noconditionaltask?

Apache 文档非常混乱.

推荐答案

这些参数用于决定何时在 hive 中使用 Map Join 而不是 Common join,这最终会影响最后的查询性能.

These parameters are used to make decision on when to use Map Join against Common join in hive, which ultimately affects query performance at the end.

Map join 当连接表之一小到足以放入内存时使用,因此速度非常快.这里是所有参数的解释:

Map join is used when one of the join tables is small enough to fit in the memory, so it is very fast. here's the explanation of all parameters:

hive.auto.convert.join

当这个参数设置为true时,Hive会自动检查较小的表文件大小是否大于hive.mapjoin.smalltable.filesize指定的值,如果它大于这个值,然后查询通过公共连接执行.启用自动转换连接后,无需在查询中提供地图连接提示.

When this parameter set to true, Hive will automatically check if the smaller table file size is bigger than the value specified by hive.mapjoin.smalltable.filesize, if it's larger than this value then query execute through common join. Once auto convert join is enabled, there is no need to provide the map join hints in the query.

hive.auto.convert.join.noconditionaltask

当join涉及三个或更多表时,和

When three or more tables are involved in join, and

hive.auto.convert.join = true - Hive 生成​​三个或更多地图侧连接,并假设所有表的大小都较小.

hive.auto.convert.join = true - Hive generates three or more map-side joins with an assumption that all tables are of smaller size.

hive.auto.convert.join.noconditionaltask = true,如果n-1表的大小小于10,hive会将三个或更多map-side join合并成一个map-side joinMB.这里的大小由 hive.auto.convert.join.noconditionaltask.size 定义.

hive.auto.convert.join.noconditionaltask = true, hive will combine three or more map-side joins into a single map-side join if size of n-1 table is less than 10 MB. Here size is defined by hive.auto.convert.join.noconditionaltask.size.

hive.mapjoin.smalltable.filesize

这个设置基本上是告诉优化器你系统中小表的定义的方式.这个值定义了什么是小表,然后当查询基于这个值执行时,它确定 join 是否有资格转换为 map join.

This setting basically the way to tell optimizer the definition of small table in your system. This value defines what is small table for you and then when query executes based on this value it determines if join is eligible to convert into map join.

hive.auto.convert.join.noconditionaltask.size

大小配置使用户能够控制什么大小的表可以放入内存.该值表示可以转换为适合内存的哈希图的表大小的总和.

The size configuration enables the user to control what size table can fit in memory. This value represents the sum of the sizes of tables that can be converted to hashmaps that fit in memory.

这是一个很好的解释链接,其中包括对所有 4 个参数的描述以及示例:

Here's the very good explanation link which includes description for all 4 parameters with an example:

http://www.openkb.info/2016/01/差异hivemapjoinsmalltabl.html

这篇关于Hive Map-Join 配置之谜的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆