为什么这个 MapReduce 中最后的 reduce 步骤非常慢?(HiveQL、HDFS MapReduce) [英] Why is the final reduce step extremely slow in this MapReduce? (HiveQL, HDFS MapReduce)

查看:24
本文介绍了为什么这个 MapReduce 中最后的 reduce 步骤非常慢?(HiveQL、HDFS MapReduce)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一些背景信息:

我正在使用 Dataiku DSS、HDFS 和分区数据集.我有一个正在运行的特定作业(Hive 查询),它有两个输入数据集 - 一个是非常大的分区数据集,另一个是小型(约 250 行,2 列)非分区数据集.我们称分区表为 A,非分区表为 B.

问题:

查询格式如下,

选择 a.f1, f2, ..., fn从 A 作为左连接 B 作为 b开 a.f1 = b.f1在哪里 {PARTITION_FILTER}

这是 MapReduce 作业的当前输出(请记住,此作业仍在运行):

[09:05:53] [INFO] [dku.utils] - 信息:工作总数 = 4[09:05:53] [INFO] [dku.utils] - INFO:以串行模式启动任务 [Stage-10:CONDITIONAL][09:05:53] [INFO] [dku.utils] - INFO:Stage-11 被条件解析器过滤掉.[09:05:53] [INFO] [dku.utils] - INFO:Stage-1 由条件解析器选择.[09:05:53] [INFO] [dku.utils] - 信息:启动作业 1,共 4 个[09:05:53] [INFO] [dku.utils] - INFO:以串行模式启动任务 [Stage-1:MAPRED][09:05:53] [INFO] [dku.utils] - INFO:未指定减少任务的数量.根据输入数据大小估计:307[09:05:53] [INFO] [dku.utils] - INFO:为了更改减速器的平均负载(以字节为单位):[09:05:53] [INFO] [dku.utils] - 信息:设置 hive.exec.reducers.bytes.per.reducer=<number>[09:05:53] [INFO] [dku.utils] - INFO:为了限制减速器的最大数量:[09:05:53] [INFO] [dku.utils] - 信息:设置 hive.exec.reducers.max=<number>[09:05:53] [INFO] [dku.utils] - INFO:为了设置一个恒定数量的减速器:[09:05:53] [INFO] [dku.utils] - 信息:设置 mapreduce.job.reduces=<number>[09:05:53] [INFO] [dku.utils] - 信息:分割数:75[09:05:53] [INFO] [dku.utils] - INFO:提交作业令牌:job_1529508387832_10211[09:05:53] [INFO] [dip.hiveserver2.log.sniffer] - 嗅探到 applicationId =>应用程序_1529508387832_10211/[09:05:53] [信息] [dku.utils] - 信息:杀死命令 =/opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/hadoop/bin/hadoop 作业-kill job_1529508387832_10211[09:05:53] [INFO] [dku.utils] - INFO:第一阶段的 Hadoop 作业信息:映射器数量:75;减速机数量:307[09:05:53] [INFO] [dku.utils] - INFO:2018-06-27 09:05:47,749 阶段 1 地图 = 0%,减少 = 0%[09:06:48] [INFO] [dku.utils] - INFO:2018-06-27 09:06:48,444 Stage-1 map = 0%,reduce = 0%,Cumulative CPU 278.18 sec[09:06:53] [INFO] [dku.utils] - INFO:2018-06-27 09:06:51,682 Stage-1 map = 1%,reduce = 0%,Cumulative CPU 373.35 sec[09:07:03] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:00,159 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 501.46 sec[09:07:03] [INFO] [dku.utils] - INFO:2018-06-27 09:07:02,235 Stage-1 map = 3%,reduce = 0%,Cumulative CPU 539.6 sec[09:07:08] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:07,472 Stage-1 map = 4%, reduce = 0%, Cumulative CPU 1389.81 sec[09:07:13] [INFO] [dku.utils] - INFO:2018-06-27 09:07:10,605 Stage-1 map = 5%,reduce = 0%,Cumulative CPU 2172.52 sec[09:07:13] [INFO] [dku.utils] - INFO:2018-06-27 09:07:12,574 Stage-1 map = 6%,reduce = 0%,Cumulative CPU 2577.81 sec[09:07:13] [INFO] [dku.utils] - INFO:2018-06-27 09:07:13,604 Stage-1 map = 8%,reduce = 0%,Cumulative CPU 2865.44 sec[09:07:18] [INFO] [dku.utils] - INFO:2018-06-27 09:07:15,747 Stage-1 map = 9%,reduce = 0%,Cumulative CPU 3110.21 sec[09:07:23] [INFO] [dku.utils] - INFO:2018-06-27 09:07:19,898 Stage-1 map = 10%,reduce = 0%,Cumulative CPU 4080.2 sec[09:07:23] [INFO] [dku.utils] - INFO:2018-06-27 09:07:21,988 Stage-1 map = 11%,reduce = 0%,Cumulative CPU 4522.48 sec[09:07:23] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:23,015 Stage-1 map = 12%, reduce = 0%, Cumulative CPU 4755.96 sec[09:07:33] [INFO] [dku.utils] - INFO:2018-06-27 09:07:29,335 Stage-1 map = 13%,reduce = 0%,Cumulative CPU 5710.85 sec[09:07:33] [INFO] [dku.utils] - INFO:2018-06-27 09:07:31,407 Stage-1 map = 14%,reduce = 0%,Cumulative CPU 5948.34 sec[09:07:38] [INFO] [dku.utils] - INFO:2018-06-27 09:07:34,555 Stage-1 map = 15%,reduce = 0%,Cumulative CPU 6399.6 sec[09:07:38] [INFO] [dku.utils] - INFO:2018-06-27 09:07:37,663 Stage-1 map = 16%,reduce = 0%,Cumulative CPU 6811.22 sec[09:07:43] [INFO] [dku.utils] - INFO:2018-06-27 09:07:38,695 Stage-1 map = 19%,reduce = 0%,Cumulative CPU 7087.68 sec[09:07:43] [INFO] [dku.utils] - INFO:2018-06-27 09:07:39,729 Stage-1 map = 20%,reduce = 0%,Cumulative CPU 7288.22 sec[09:07:43] [INFO] [dku.utils] - INFO:2018-06-27 09:07:40,769 Stage-1 map = 22%,reduce = 0%,Cumulative CPU 7520.54 sec[09:07:43] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:41,903 Stage-1 map = 24%, reduce = 0%, Cumulative CPU 7771.37 sec[09:07:43] [INFO] [dku.utils] - INFO:2018-06-27 09:07:42,930 Stage-1 map = 25%,reduce = 0%,Cumulative CPU 7936.9 sec[09:07:48] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:45,035 Stage-1 map = 27%, reduce = 0%, Cumulative CPU 8254.78 sec[09:07:48] [INFO] [dku.utils] - INFO:2018-06-27 09:07:46,075 Stage-1 map = 29%,reduce = 0%,Cumulative CPU 8428.35 sec[09:07:48] [INFO] [dku.utils] - INFO:2018-06-27 09:07:47,111 Stage-1 map = 30%,reduce = 0%,Cumulative CPU 8661.23 sec[09:07:48] [INFO] [dku.utils] - INFO:2018-06-27 09:07:48,153 Stage-1 map = 31%,reduce = 0%,Cumulative CPU 8834.37 sec[09:07:53] [INFO] [dku.utils] - INFO:2018-06-27 09:07:49,193 Stage-1 map = 33%,reduce = 0%,Cumulative CPU 8983.68 sec[09:07:53] [INFO] [dku.utils] - INFO:2018-06-27 09:07:50,227 Stage-1 map = 35%,reduce = 0%,Cumulative CPU 9149.94 sec[09:07:53] [INFO] [dku.utils] - INFO:2018-06-27 09:07:51,263 Stage-1 map = 37%,reduce = 0%,Cumulative CPU 9268.9 sec[09:07:53] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:52,301 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 9415.86 sec[09:07:53] [INFO] [dku.utils] - INFO:2018-06-27 09:07:53,352 Stage-1 map = 39%,reduce = 0%,Cumulative CPU 9540.63 sec[09:07:58] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:54,381 Stage-1 map = 41%, reduce = 0%, Cumulative CPU 9711.54 sec[09:07:58] [INFO] [dku.utils] - INFO:2018-06-27 09:07:55,421 Stage-1 map = 43%,reduce = 0%,Cumulative CPU 9823.52 sec[09:07:58] [INFO] [dku.utils] - INFO : 2018-06-27 09:07:56,453 Stage-1 map = 46%, reduce = 0%, Cumulative CPU 10010.83 sec[09:07:58] [INFO] [dku.utils] - INFO:2018-06-27 09:07:57,492 Stage-1 map = 49%,reduce = 0%,Cumulative CPU 10081.9 sec[09:07:58] [INFO] [dku.utils] - INFO:2018-06-27 09:07:58,532 Stage-1 map = 53%,reduce = 0%,Cumulative CPU 10230.13 sec[09:08:03] [INFO] [dku.utils] - INFO:2018-06-27 09:07:59,576 Stage-1 map = 56%,reduce = 0%,Cumulative CPU 10392.61 sec[09:08:03] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:00,607 Stage-1 map = 58%, reduce = 0%, Cumulative CPU 10483.38 sec[09:08:03] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:01,649 Stage-1 map = 63%, reduce = 0%, Cumulative CPU 10618.16 sec[09:08:03] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:02,672 Stage-1 map = 66%, reduce = 0%, Cumulative CPU 10684.82 sec[09:08:08] [INFO] [dku.utils] - INFO:2018-06-27 09:08:03,695 Stage-1 map = 67%,reduce = 0%,Cumulative CPU 10701.95 sec[09:08:08] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:04,720 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 10767.21 sec[09:08:08] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:05,750 Stage-1 map = 71%, reduce = 0%, Cumulative CPU 10849.92 sec[09:08:08] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:06,780 Stage-1 map = 73%, reduce = 0%, Cumulative CPU 10924.45 sec[09:08:08] [INFO] [dku.utils] - INFO:2018-06-27 09:08:07,902 Stage-1 map = 76%,reduce = 0%,Cumulative CPU 11000.21 sec[09:08:13] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:09,965 Stage-1 map = 77%, reduce = 0%, Cumulative CPU 11013.58 sec[09:08:13] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:10,991 Stage-1 map = 78%, reduce = 0%, Cumulative CPU 11057.76 sec[09:08:18] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:14,216 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 11157.51 sec[09:08:18] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:17,362 Stage-1 map = 82%, reduce = 0%, Cumulative CPU 11392.85 sec[09:08:23] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:20,460 Stage-1 map = 83%, reduce = 0%, Cumulative CPU 11610.7 sec[09:08:28] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:26,106 Stage-1 map = 84%, reduce = 0%, Cumulative CPU 11781.65 sec[09:08:28] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:28,163 Stage-1 map = 85%, reduce = 0%, Cumulative CPU 11788.58 sec[09:08:38] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:35,410 Stage-1 map = 86%, reduce = 0%, Cumulative CPU 12167.24 sec[09:08:38] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:38,540 Stage-1 map = 86%, reduce = 1%, Cumulative CPU 12317.09 sec[09:08:43] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:39,583 Stage-1 map = 86%, reduce = 2%, Cumulative CPU 12329.8 sec[09:08:43] [INFO] [dku.utils] - INFO:2018-06-27 09:08:40,631 Stage-1 map = 86%,reduce = 3%,Cumulative CPU 12333.61 sec[09:08:48] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:47,927 Stage-1 map = 87%, reduce = 3%, Cumulative CPU 12651.77 sec[09:08:53] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:48,970 Stage-1 map = 88%, reduce = 4%, Cumulative CPU 12826.37 sec[09:08:53] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:51,092 Stage-1 map = 88%, reduce = 5%, Cumulative CPU 12857.19 sec[09:08:58] [INFO] [dku.utils] - INFO : 2018-06-27 09:08:54,214 Stage-1 map = 90%, reduce = 5%, Cumulative CPU 13037.63 sec[09:09:03] [INFO] [dku.utils] - INFO:2018-06-27 09:08:59,396 Stage-1 map = 91%,reduce = 5%,Cumulative CPU 13117.71 sec[09:09:03] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:00,440 Stage-1 map = 92%, reduce = 5%, Cumulative CPU 13238.06 sec[09:09:03] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:01,485 Stage-1 map = 93%, reduce = 5%, Cumulative CPU 13249.8 sec[09:09:08] [INFO] [dku.utils] - INFO:2018-06-27 09:09:05,660 Stage-1 map = 94%,reduce = 5%,Cumulative CPU 13306.0 sec[09:09:08] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:06,706 Stage-1 map = 97%, reduce = 5%, Cumulative CPU 13393.5 sec[09:09:08] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:07,751 Stage-1 map = 97%, reduce = 6%, Cumulative CPU 13409.12 sec[09:09:08] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:08,795 Stage-1 map = 98%, reduce = 6%, Cumulative CPU 13433.07 sec[09:09:13] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:09,835 Stage-1 map = 98%, reduce = 8%, Cumulative CPU 13474.03 sec[09:09:13] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:10,874 Stage-1 map = 98%, reduce = 9%, Cumulative CPU 13484.64 sec[09:09:18] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:14,004 Stage-1 map = 100%, reduce = 11%, Cumulative CPU 13580.71 sec[09:09:18] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:15,118 Stage-1 map = 100%, reduce = 13%, Cumulative CPU 13619.15 sec[09:09:18] [INFO] [dku.utils] - INFO:2018-06-27 09:09:16,160 Stage-1 map = 100%,reduce = 16%,Cumulative CPU 13707.2 sec[09:09:18] [INFO] [dku.utils] - INFO:2018-06-27 09:09:17,210 Stage-1 map = 100%,reduce = 33%,Cumulative CPU 14456.75 sec[09:09:18] [INFO] [dku.utils] - INFO:2018-06-27 09:09:18,258 Stage-1 map = 100%,reduce = 39%,Cumulative CPU 14708.07 sec[09:09:23] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:19,291 Stage-1 map = 100%, reduce = 40%, Cumulative CPU 14768.29 sec[09:09:23] [INFO] [dku.utils] - INFO:2018-06-27 09:09:20,329 Stage-1 map = 100%,reduce = 41%,Cumulative CPU 14834.88 sec[09:09:23] [INFO] [dku.utils] - INFO:2018-06-27 09:09:21,360 Stage-1 map = 100%,reduce = 42%,Cumulative CPU 14902.4 sec[09:09:23] [INFO] [dku.utils] - INFO:2018-06-27 09:09:22,399 Stage-1 map = 100%,reduce = 45%,Cumulative CPU 15040.16 sec[09:09:23] [INFO] [dku.utils] - INFO:2018-06-27 09:09:23,433 Stage-1 map = 100%,reduce = 47%,Cumulative CPU 15165.58 sec[09:09:28] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:28,627 Stage-1 map = 100%, reduce = 63%, Cumulative CPU 15792.29 sec[09:09:33] [INFO] [dku.utils] - INFO:2018-06-27 09:09:30,711 Stage-1 map = 100%,reduce = 64%,Cumulative CPU 15889.21 sec[09:09:33] [INFO] [dku.utils] - INFO:2018-06-27 09:09:31,753 Stage-1 map = 100%,reduce = 65%,Cumulative CPU 15898.95 sec[09:09:33] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:32,789 Stage-1 map = 100%, reduce = 66%, Cumulative CPU 15927.26 sec[09:09:33] [INFO] [dku.utils] - INFO:2018-06-27 09:09:33,822 阶段 1 映射 = 100%,减少 = 71%,累积 CPU 16086.93 秒[09:09:38] [INFO] [dku.utils] - INFO:2018-06-27 09:09:34,866 Stage-1 map = 100%,reduce = 90%,Cumulative CPU 16711.13 sec[09:09:38] [INFO] [dku.utils] - INFO:2018-06-27 09:09:35,907 Stage-1 map = 100%,reduce = 93%,Cumulative CPU 16795.34 sec[09:09:38] [INFO] [dku.utils] - INFO:2018-06-27 09:09:36,952 Stage-1 map = 100%,reduce = 95%,Cumulative CPU 16881.47 sec[09:09:38] [INFO] [dku.utils] - INFO:2018-06-27 09:09:37,995 阶段 1 映射 = 100%,减少 = 96%,累积 CPU 16891.18 秒[09:09:48] [INFO] [dku.utils] - INFO:2018-06-27 09:09:44,249 阶段 1 映射 = 100%,减少 = 97%,累积 CPU 16958.21 秒[09:09:48] [INFO] [dku.utils] - INFO : 2018-06-27 09:09:45,292 Stage-1 map = 100%, reduce = 98%, Cumulative CPU 17011.88 sec[09:09:48] [INFO] [dku.utils] - INFO:2018-06-27 09:09:47,378 Stage-1 map = 100%,reduce = 99%,Cumulative CPU 17055.07 sec[09:10:49] [INFO] [dku.utils] - INFO:2018-06-27 09:10:47,421 阶段 1 映射 = 100%,减少 = 99%,累积 CPU 17545.89 秒[09:11:49] [INFO] [dku.utils] - INFO:2018-06-27 09:11:47,872 阶段 1 映射 = 100%,减少 = 99%,累积 CPU 17764.45 秒[09:12:49] [INFO] [dku.utils] - INFO:2018-06-27 09:12:48,287 Stage-1 map = 100%,reduce = 99%,Cumulative CPU 18330.86 sec[09:13:49] [INFO] [dku.utils] - INFO:2018-06-27 09:13:48,855 Stage-1 map = 100%,reduce = 99%,Cumulative CPU 19232.58 sec...............[10:22:20] [INFO] [dku.utils] - INFO:2018-06-27 10:22:17,545 Stage-1 map = 100%,reduce = 99%,Cumulative CPU 26857.28 sec[10:23:23] [INFO] [dku.utils] - INFO:2018-06-27 10:23:17,654 阶段 1 映射 = 100%,减少 = 99%,累积 CPU 26974.96 秒[10:24:18] [INFO] [dku.utils] - INFO : 2018-06-27 10:24:18,112 Stage-1 map = 100%, reduce = 99%, Cumulative CPU 27081.0 sec[10:25:23] [INFO] [dku.utils] - 信息:2018-06-27 10:25:18,964 阶段 1 映射 = 100%,减少 = 99%,累积 CPU 27187.52 秒[10:26:23] [INFO] [dku.utils] - INFO:2018-06-27 10:26:19,404 阶段 1 映射 = 100%,减少 = 99%,累积 CPU 27287.58 秒

如您所见,reduce 阶段的最后 1% 需要相当长的时间(到目前为止,总共 1 小时 32 分钟已经过去了 1 小时 27 分钟).

我很难找到能够清楚简明地解释为什么会发生这种情况的资源.还值得一提的是,我对 MapReduce 有非常基本的了解,但目前正在研究 HDFS 环境中的 MapReduce.根据我的阅读,由于执行 JOIN 的性质,可能大部分 reduce 任务都固定在一个 reducer 上.

如果有人可以对正在发生的事情进行高级解释,请指出一些好的资源的方向,并提供有关如何通过更改我的 Hive 查询在未来的工作中避免这种情况的提示,我将不胜感激.

解决方案

如果最终 reducer 是一个连接,那么它看起来像连接键中的倾斜.首先检查两件事:

检查 b.f1 连接键是否重复:

select b.f1, count(*) cnt from B b按 b.f1 分组count(*)>1 order by cnt desc;

检查a.f1的分布:

select a.f1, count(*) cnt from A a按 a.f1 分组按cnt desc订购限制 10;

此查询将显示倾斜的键.

如果存在倾斜(具有相同值的行太多),则分别连接倾斜的键,使用 union all:

选择 a.f1, f2, ..., fnFROM (select * from A where f1 = skewed_value) 作为 --skewed左连接 B 作为 b开 a.f1 = b.f1在哪里 {PARTITION_FILTER}联合所有选择 a.f1, f2, ..., fnFROM (select * from A where f1 != skewed_value) as a --all other左连接 B 作为 b开 a.f1 = b.f1在哪里 {PARTITION_FILTER}

最后,如果不存在歪斜和重复的问题,则尝试增加减速器的并行度:获取每个 reducer 配置的当前字节数

设置 hive.exec.reducers.bytes.per.reducer;

通常这将返回大约 1G 的值.尝试除以二,在查询之前设置新值并检查它将启动多少个减速器和性能.成功的标准是更多的减速器已经启动并且性能得到了提高.

设置 hive.exec.reducers.bytes.per.reducer=67108864;

每个reducer的字节数越少,启动的reducer就越多,并行度越高;

更新:尝试启用 map-join,您的第二个表足够小,可以放入内存,mapjoin 完全可以在没有减速器的情况下工作,并且减速器的倾斜也没有问题.p>

如何启用 mapjoin:https://stackoverflow.com/a/49154414/2700344

Some background information:

I'm working with Dataiku DSS, HDFS, and partitioned datasets. I have a particular job running (Hive query) which has two input datasets - one a very large, partitioned dataset, the other a small (~250 rows, 2 columns), non-partitioned dataset. Let's call the partitioned table A, and the non-partitioned table B.

Question:

The query is of the following format,

SELECT a.f1, f2, ..., fn
  FROM A as a
  LEFT JOIN B as b
  ON a.f1 = b.f1
WHERE {PARTITION_FILTER}

Here is the current ouput from the MapReduce job (keep in mind this job is still running):

[09:05:53] [INFO] [dku.utils]  - INFO  : Total jobs = 4
[09:05:53] [INFO] [dku.utils]  - INFO  : Starting task [Stage-10:CONDITIONAL] in serial mode
[09:05:53] [INFO] [dku.utils]  - INFO  : Stage-11 is filtered out by condition resolver.
[09:05:53] [INFO] [dku.utils]  - INFO  : Stage-1 is selected by condition resolver.
[09:05:53] [INFO] [dku.utils]  - INFO  : Launching Job 1 out of 4
[09:05:53] [INFO] [dku.utils]  - INFO  : Starting task [Stage-1:MAPRED] in serial mode
[09:05:53] [INFO] [dku.utils]  - INFO  : Number of reduce tasks not specified. Estimated from input data size: 307
[09:05:53] [INFO] [dku.utils]  - INFO  : In order to change the average load for a reducer (in bytes):
[09:05:53] [INFO] [dku.utils]  - INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
[09:05:53] [INFO] [dku.utils]  - INFO  : In order to limit the maximum number of reducers:
[09:05:53] [INFO] [dku.utils]  - INFO  :   set hive.exec.reducers.max=<number>
[09:05:53] [INFO] [dku.utils]  - INFO  : In order to set a constant number of reducers:
[09:05:53] [INFO] [dku.utils]  - INFO  :   set mapreduce.job.reduces=<number>
[09:05:53] [INFO] [dku.utils]  - INFO  : number of splits:75
[09:05:53] [INFO] [dku.utils]  - INFO  : Submitting tokens for job: job_1529508387832_10211
[09:05:53] [INFO] [dip.hiveserver2.log.sniffer]  - sniffed applicationId => application_1529508387832_10211/
[09:05:53] [INFO] [dku.utils]  - INFO  : Kill Command = /opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/hadoop/bin/hadoop job  -kill job_1529508387832_10211
[09:05:53] [INFO] [dku.utils]  - INFO  : Hadoop job information for Stage-1: number of mappers: 75; number of reducers: 307
[09:05:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:05:47,749 Stage-1 map = 0%,  reduce = 0%
[09:06:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:06:48,444 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 278.18 sec
[09:06:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:06:51,682 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 373.35 sec
[09:07:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:00,159 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 501.46 sec
[09:07:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:02,235 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 539.6 sec
[09:07:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:07,472 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 1389.81 sec
[09:07:13] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:10,605 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 2172.52 sec
[09:07:13] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:12,574 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 2577.81 sec
[09:07:13] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:13,604 Stage-1 map = 8%,  reduce = 0%, Cumulative CPU 2865.44 sec
[09:07:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:15,747 Stage-1 map = 9%,  reduce = 0%, Cumulative CPU 3110.21 sec
[09:07:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:19,898 Stage-1 map = 10%,  reduce = 0%, Cumulative CPU 4080.2 sec
[09:07:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:21,988 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 4522.48 sec
[09:07:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:23,015 Stage-1 map = 12%,  reduce = 0%, Cumulative CPU 4755.96 sec
[09:07:33] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:29,335 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 5710.85 sec
[09:07:33] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:31,407 Stage-1 map = 14%,  reduce = 0%, Cumulative CPU 5948.34 sec
[09:07:38] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:34,555 Stage-1 map = 15%,  reduce = 0%, Cumulative CPU 6399.6 sec
[09:07:38] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:37,663 Stage-1 map = 16%,  reduce = 0%, Cumulative CPU 6811.22 sec
[09:07:43] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:38,695 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 7087.68 sec
[09:07:43] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:39,729 Stage-1 map = 20%,  reduce = 0%, Cumulative CPU 7288.22 sec
[09:07:43] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:40,769 Stage-1 map = 22%,  reduce = 0%, Cumulative CPU 7520.54 sec
[09:07:43] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:41,903 Stage-1 map = 24%,  reduce = 0%, Cumulative CPU 7771.37 sec
[09:07:43] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:42,930 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 7936.9 sec
[09:07:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:45,035 Stage-1 map = 27%,  reduce = 0%, Cumulative CPU 8254.78 sec
[09:07:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:46,075 Stage-1 map = 29%,  reduce = 0%, Cumulative CPU 8428.35 sec
[09:07:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:47,111 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU 8661.23 sec
[09:07:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:48,153 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU 8834.37 sec
[09:07:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:49,193 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 8983.68 sec
[09:07:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:50,227 Stage-1 map = 35%,  reduce = 0%, Cumulative CPU 9149.94 sec
[09:07:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:51,263 Stage-1 map = 37%,  reduce = 0%, Cumulative CPU 9268.9 sec
[09:07:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:52,301 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 9415.86 sec
[09:07:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:53,352 Stage-1 map = 39%,  reduce = 0%, Cumulative CPU 9540.63 sec
[09:07:58] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:54,381 Stage-1 map = 41%,  reduce = 0%, Cumulative CPU 9711.54 sec
[09:07:58] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:55,421 Stage-1 map = 43%,  reduce = 0%, Cumulative CPU 9823.52 sec
[09:07:58] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:56,453 Stage-1 map = 46%,  reduce = 0%, Cumulative CPU 10010.83 sec
[09:07:58] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:57,492 Stage-1 map = 49%,  reduce = 0%, Cumulative CPU 10081.9 sec
[09:07:58] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:58,532 Stage-1 map = 53%,  reduce = 0%, Cumulative CPU 10230.13 sec
[09:08:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:07:59,576 Stage-1 map = 56%,  reduce = 0%, Cumulative CPU 10392.61 sec
[09:08:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:00,607 Stage-1 map = 58%,  reduce = 0%, Cumulative CPU 10483.38 sec
[09:08:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:01,649 Stage-1 map = 63%,  reduce = 0%, Cumulative CPU 10618.16 sec
[09:08:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:02,672 Stage-1 map = 66%,  reduce = 0%, Cumulative CPU 10684.82 sec
[09:08:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:03,695 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 10701.95 sec
[09:08:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:04,720 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 10767.21 sec
[09:08:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:05,750 Stage-1 map = 71%,  reduce = 0%, Cumulative CPU 10849.92 sec
[09:08:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:06,780 Stage-1 map = 73%,  reduce = 0%, Cumulative CPU 10924.45 sec
[09:08:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:07,902 Stage-1 map = 76%,  reduce = 0%, Cumulative CPU 11000.21 sec
[09:08:13] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:09,965 Stage-1 map = 77%,  reduce = 0%, Cumulative CPU 11013.58 sec
[09:08:13] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:10,991 Stage-1 map = 78%,  reduce = 0%, Cumulative CPU 11057.76 sec
[09:08:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:14,216 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 11157.51 sec
[09:08:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:17,362 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU 11392.85 sec
[09:08:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:20,460 Stage-1 map = 83%,  reduce = 0%, Cumulative CPU 11610.7 sec
[09:08:28] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:26,106 Stage-1 map = 84%,  reduce = 0%, Cumulative CPU 11781.65 sec
[09:08:28] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:28,163 Stage-1 map = 85%,  reduce = 0%, Cumulative CPU 11788.58 sec
[09:08:38] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:35,410 Stage-1 map = 86%,  reduce = 0%, Cumulative CPU 12167.24 sec
[09:08:38] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:38,540 Stage-1 map = 86%,  reduce = 1%, Cumulative CPU 12317.09 sec
[09:08:43] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:39,583 Stage-1 map = 86%,  reduce = 2%, Cumulative CPU 12329.8 sec
[09:08:43] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:40,631 Stage-1 map = 86%,  reduce = 3%, Cumulative CPU 12333.61 sec
[09:08:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:47,927 Stage-1 map = 87%,  reduce = 3%, Cumulative CPU 12651.77 sec
[09:08:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:48,970 Stage-1 map = 88%,  reduce = 4%, Cumulative CPU 12826.37 sec
[09:08:53] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:51,092 Stage-1 map = 88%,  reduce = 5%, Cumulative CPU 12857.19 sec
[09:08:58] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:54,214 Stage-1 map = 90%,  reduce = 5%, Cumulative CPU 13037.63 sec
[09:09:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:08:59,396 Stage-1 map = 91%,  reduce = 5%, Cumulative CPU 13117.71 sec
[09:09:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:00,440 Stage-1 map = 92%,  reduce = 5%, Cumulative CPU 13238.06 sec
[09:09:03] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:01,485 Stage-1 map = 93%,  reduce = 5%, Cumulative CPU 13249.8 sec
[09:09:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:05,660 Stage-1 map = 94%,  reduce = 5%, Cumulative CPU 13306.0 sec
[09:09:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:06,706 Stage-1 map = 97%,  reduce = 5%, Cumulative CPU 13393.5 sec
[09:09:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:07,751 Stage-1 map = 97%,  reduce = 6%, Cumulative CPU 13409.12 sec
[09:09:08] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:08,795 Stage-1 map = 98%,  reduce = 6%, Cumulative CPU 13433.07 sec
[09:09:13] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:09,835 Stage-1 map = 98%,  reduce = 8%, Cumulative CPU 13474.03 sec
[09:09:13] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:10,874 Stage-1 map = 98%,  reduce = 9%, Cumulative CPU 13484.64 sec
[09:09:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:14,004 Stage-1 map = 100%,  reduce = 11%, Cumulative CPU 13580.71 sec
[09:09:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:15,118 Stage-1 map = 100%,  reduce = 13%, Cumulative CPU 13619.15 sec
[09:09:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:16,160 Stage-1 map = 100%,  reduce = 16%, Cumulative CPU 13707.2 sec
[09:09:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:17,210 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 14456.75 sec
[09:09:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:18,258 Stage-1 map = 100%,  reduce = 39%, Cumulative CPU 14708.07 sec
[09:09:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:19,291 Stage-1 map = 100%,  reduce = 40%, Cumulative CPU 14768.29 sec
[09:09:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:20,329 Stage-1 map = 100%,  reduce = 41%, Cumulative CPU 14834.88 sec
[09:09:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:21,360 Stage-1 map = 100%,  reduce = 42%, Cumulative CPU 14902.4 sec
[09:09:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:22,399 Stage-1 map = 100%,  reduce = 45%, Cumulative CPU 15040.16 sec
[09:09:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:23,433 Stage-1 map = 100%,  reduce = 47%, Cumulative CPU 15165.58 sec
[09:09:28] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:28,627 Stage-1 map = 100%,  reduce = 63%, Cumulative CPU 15792.29 sec
[09:09:33] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:30,711 Stage-1 map = 100%,  reduce = 64%, Cumulative CPU 15889.21 sec
[09:09:33] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:31,753 Stage-1 map = 100%,  reduce = 65%, Cumulative CPU 15898.95 sec
[09:09:33] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:32,789 Stage-1 map = 100%,  reduce = 66%, Cumulative CPU 15927.26 sec
[09:09:33] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:33,822 Stage-1 map = 100%,  reduce = 71%, Cumulative CPU 16086.93 sec
[09:09:38] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:34,866 Stage-1 map = 100%,  reduce = 90%, Cumulative CPU 16711.13 sec
[09:09:38] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:35,907 Stage-1 map = 100%,  reduce = 93%, Cumulative CPU 16795.34 sec
[09:09:38] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:36,952 Stage-1 map = 100%,  reduce = 95%, Cumulative CPU 16881.47 sec
[09:09:38] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:37,995 Stage-1 map = 100%,  reduce = 96%, Cumulative CPU 16891.18 sec
[09:09:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:44,249 Stage-1 map = 100%,  reduce = 97%, Cumulative CPU 16958.21 sec
[09:09:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:45,292 Stage-1 map = 100%,  reduce = 98%, Cumulative CPU 17011.88 sec
[09:09:48] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:09:47,378 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 17055.07 sec
[09:10:49] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:10:47,421 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 17545.89 sec
[09:11:49] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:11:47,872 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 17764.45 sec
[09:12:49] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:12:48,287 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 18330.86 sec
[09:13:49] [INFO] [dku.utils]  - INFO  : 2018-06-27 09:13:48,855 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 19232.58 sec
...
...
...
...
...
[10:22:20] [INFO] [dku.utils]  - INFO  : 2018-06-27 10:22:17,545 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 26857.28 sec
[10:23:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 10:23:17,654 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 26974.96 sec
[10:24:18] [INFO] [dku.utils]  - INFO  : 2018-06-27 10:24:18,112 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 27081.0 sec
[10:25:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 10:25:18,964 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 27187.52 sec
[10:26:23] [INFO] [dku.utils]  - INFO  : 2018-06-27 10:26:19,404 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 27287.58 sec

As you can see, the final 1% of the reduce phase is taking quite a while (so far 1h27min of the total 1h32min elapsed).

I'm having trouble finding resources which give a clear and concise explanation to why this is happening. It's also worth mentioning I have a very basic understanding of MapReduce, but am currently in the process of studying MapReduce in an HDFS context. From what I've read, it is possible that the bulk of the reduce task was pinned on one reducer due to the nature of the JOIN being performed.

If anyone can give a high-level explanation of what is happening, point me in the direction of some good resources, and give tips on how I can avoid this in future jobs by altering my Hive queries, that would be greatly appreciated.

解决方案

If final reducer is a join then it looks like skew in join key. First of all check two things:

check that b.f1 join key has no duplicates:

select b.f1, count(*) cnt from B b 
 group by b.f1 
having count(*)>1 order by cnt desc;

check the distribution of a.f1:

select a.f1, count(*) cnt from A a
 group by a.f1  
order by cnt desc
limit 10;

This query will show skewed keys.

If there is a skew (too many rows with the same value) then join skewed keys separately, use union all:

SELECT a.f1, f2, ..., fn
  FROM ( select * from A where f1 = skewed_value) as a --skewed
  LEFT JOIN B as b
  ON a.f1 = b.f1
WHERE {PARTITION_FILTER}
UNION ALL
SELECT a.f1, f2, ..., fn
  FROM ( select * from A where f1 != skewed_value) as a --all other
  LEFT JOIN B as b
  ON a.f1 = b.f1
WHERE {PARTITION_FILTER}

And finally if there is no issues with skew and duplication, then try to increase reducers parallelism: Get current bytes per reducer configuration

set hive.exec.reducers.bytes.per.reducer;

typically this will return some value about 1G. Try to divide by two, set new value before your query and check how many reducers will it start and performance. Success criteria is more reducers has started and performance improved.

set hive.exec.reducers.bytes.per.reducer=67108864;

The less the bytes per reducer the more reducers will be started, increasing parallelism;

UPDATE: Try to enable map-join, your second table is small enough to fit in memory, mapjoin will work without reducers at all and it will be no problem with skew on reducers.

How to enable mapjoin: https://stackoverflow.com/a/49154414/2700344

这篇关于为什么这个 MapReduce 中最后的 reduce 步骤非常慢?(HiveQL、HDFS MapReduce)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆