如何减少查询中的容器数量 [英] how to reduce the number of containers in the query

查看:38
本文介绍了如何减少查询中的容器数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用大量容器和大量内存的查询.(已使用内存的 97%).有没有办法设置查询中使用的容器数量并限制最大内存?查询正在 Tez 上运行.

I have a query using to much containers and to much memory. (97% of the memory used). Is there a way to set the number of containers used in the query and limit the max memory? The query is running on Tez.

提前致谢

推荐答案

控制 Mapper 的数量:

映射器的数量取决于各种因素,例如数据在节点之间的分布方式、输入格式、执行引擎和配置参数.另请参阅初始任务并行的工作原理

The number of mappers depends on various factors such as how the data is distributed among nodes, input format, execution engine and configuration params. See also How initial task parallelism works

MR 使用 CombineInputFormat,而 Tez 使用分组拆分.

MR uses CombineInputFormat, while Tez uses grouped splits.

特兹:

set tez.grouping.min-size=16777216; -- 16 MB min split
set tez.grouping.max-size=1073741824; -- 1 GB max split

增加这些数字以减少运行的映射器的数量.

Increase these figures to reduce the number of mappers running.

Mappers 也在数据所在的数据节点上运行,这就是为什么手动控制 Mapper 的数量不是一件容易的事,并不总是可以组合输入.

Also Mappers are running on data nodes where the data is located, that is why manually controlling the number of mappers is not an easy task, not always possible to combine input.

控制Reducer的数量:

reducer数量根据

The number of reducers determined according to

mapreduce.job.reduces

  • 每个作业的默认缩减任务数.通常设置为接近可用主机数量的素数.当 mapred.job.tracker 为本地"时忽略.默认情况下,Hadoop 将此设置为 1,而 Hive 使用 -1 作为其默认值.通过将此属性设置为 -1,Hive 将自动确定减速器的数量.
    • The default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is "local". Hadoop set this to 1 by default, whereas Hive uses -1 as its default value. By setting this property to -1, Hive will automatically figure out what should be the number of reducers.
    • hive.exec.reducers.bytes.per.reducer - Hive 0.14.0 及更早版本中的默认值为 1 GB.

      hive.exec.reducers.bytes.per.reducer - The default in Hive 0.14.0 and earlier is 1 GB.

      Also hive.exec.reducers.max - 将使用的最大减速器数量.如果mapreduce.job.reduces 为负数,Hive 会在自动确定reducer 数量时将此作为最大reducer 数量.

      Also hive.exec.reducers.max - Maximum number of reducers that will be used. If mapreduce.job.reduces is negative, Hive will use this as the maximum number of reducers when automatically determining the number of reducers.

      只需设置 hive.exec.reducers.max= 即可限制运行的 reducer 数量.

      Simply set hive.exec.reducers.max=<number> to limit the number of reducers running.

      如果你想增加 reducers 的并行度,增加 hive.exec.reducers.max 并减少 hive.exec.reducers.bytes.per.reducer.

      If you want to increase reducers parallelism, increase hive.exec.reducers.max and decrease hive.exec.reducers.bytes.per.reducer.

      内存设置

      set tez.am.resource.memory.mb=8192;
      set tez.am.java.opts=-Xmx6144m;
      set tez.reduce.memory.mb=6144;
      set hive.tez.container.size=9216;
      set hive.tez.java.opts=-Xmx6144m;
      

      默认设置意味着实际的 Tez 任务将使用映射器的内存设置:

      The default settings mean that the actual Tez task will use the mapper's memory setting:

      hive.tez.container.size = mapreduce.map.memory.mb
      hive.tez.java.opts = mapreduce.map.java.opts
      

      阅读更多详情:揭开 Apache Tez 内存调整的神秘面纱 - 分步

      Read this for more details: Demystify Apache Tez Memory Tuning - Step by Step

      我建议先优化查询.如果可能的话,使用map-joins,使用vectorising execution,如果你正在写分区表以减少reducers上的内存消耗并编写好的sql,请添加distribute by partitin key.

      I would suggest to optimize query first. Use map-joins if possible, use vectorising execution, add distribute by partitin key if you are writing partitioned table to reduce memory consumption on reducers and write good sql of course.

      这篇关于如何减少查询中的容器数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆