Mapreduce混洗阶段内存不足错误 [英] Out of memory error in Mapreduce shuffle phase

查看:306
本文介绍了Mapreduce混洗阶段内存不足错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

运行像wordcount这样的 mapreduce程序时出现奇怪的错误。我有一个hadoop集群,有20个从站,每个有4个内存。我将地图任务配置为拥有300MB的堆,而我的减少任务插槽的容量为1GB。我有2个地图插槽和1个减少插槽每个节点。一切顺利,直到第一轮地图任务完成。然后进展保持在100%。我想那时正在发生复制阶段。每个映射任务都会生成如下所示的内容:

I am getting strange errors while running a wordcount-like mapreduce program. I have a hadoop cluster with 20 slaves, each having 4 GB RAM. I configured my map tasks to have a heap of 300MB and my reduce task slots get 1GB. I have 2 map slots and 1 reduce slot per node. Everything goes well until the first round of map tasks finishes. Then there progress remains at 100%. I suppose then the copy phase is taking place. Each map task generates something like:

Map output bytes    4,164,335,564
Map output materialized bytes   608,800,675

(我使用SnappyCodec进行压缩)

(I am using SnappyCodec for compression)

    Error: java.lang.OutOfMemoryError: Java heap space at  
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1703) at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1563) at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1401) at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1333

我是googling,发现这个链接,但我不'我真的知道该怎么做:
hadoop常用链接

I was googling and found this link but I don't really know what to make of it: hadoop common link

我不明白如果hadoop能够执行terasort基准测试,为什么在复制和合并时会遇到任何问题。它不可能是所有的映射输出都应该适合Reducer线程的RAM。那么这里发生了什么?

I don't understand why hadoop would experience any problems in copying and merging if it is able to perform a terasort benchmark. It cannot be that all map output should fit into the RAM of the reducer thread. So what is going on here?

在上面提供的链接中,他们讨论了调整以下参数:

In the link provided above they have a discussion about tuning the following parameters:

mapreduce.reduce.shuffle.input.buffer.percent = 0.7
mapreduce.reduce.shuffle.memory.limit.percent = 0.25
mapreduce.reduce.shuffle.parallelcopies = 5

他们声称参数乘积> 1的事实允许<强>堆错误。
编辑:请注意,5 * 1.25 * 0.7仍然<1,因此请关注我的第二个解决方案帖子!)
在重新开始这个密集模拟之前,我会很高兴听到某人关于问题的意见I因为它现在困扰了将近一周。我似乎还没有完全理解这个复制阶段发生了什么,我希望磁盘上的合并排序不需要太多的堆大小?

They claim that the fact that the product of the parameters is >1 allows for heapsize errors. Note that 5*1.25*0.7 is still <1 so focus om my second solution post!) Before restarting this intensive simulation I would be very happy to hear about someone's opinion concerning the problem I am facing since it is bothering for almost a week now. I also seem to not completely understand what is happening in this copy phase, I'd expect a merge sort on disk not to require much heap size?

谢谢很多提前有任何有用的意见和答案!

Thanks a lot in advance for any helpful comments and answers!

推荐答案

我认为线索是,我的heapsize减少阶段几乎完全需要减少任务。但是,洗牌阶段正在争夺同样的heapspace ,导致我的工作崩溃。我认为这解释了为什么如果我降低 shuffle.input.buffer.percent

I think the clue is that the heapsize of my reduce task was required almost completely for the reduce phase. But the shuffle phase is competing for the same heapspace, the conflict which arose caused my jobs to crash. I think this explains why the job no longer crashes if I lower the shuffle.input.buffer.percent.

这篇关于Mapreduce混洗阶段内存不足错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆