不寻常的Hadoop错误 - 任务会自行死亡 [英] Unusual Hadoop error - tasks get killed on their own

查看:127
本文介绍了不寻常的Hadoop错误 - 任务会自行死亡的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我运行我的hadoop作业时,出现以下错误:

用户
已收到要求终止任务'attempt_201202230353_23186_r_000004_0'的请求任务已完成KILLED_UNCLEAN



日志显示为干净。我运行了28个减速器,而这对所有的减速器都没有发生。它发生在选定的几个,减速器再次启动。我不明白这一点。另外我注意到的是,对于一个小数据集,我很少看到这个错误!

解决方案

有三件事情可以尝试:



设置计数器
如果Hadoop看到工作进度计数器,那么它不会杀死它(请参阅Arockiaraj Durairaj的答案)这似乎是最优雅的,因为它可以让你更深入地了解长时间运行的工作,而且可能会造成困扰。



更长的任务超时默认情况下,Hadoop作业超时10分钟后超时。改变超时是有点蛮力的,但可以工作。想象一下分析一般为5MB文件(歌曲)的音频文件,但是你有几个50MB文件(整个专辑)。 Hadoop为每个块存储单个文件。所以如果你的HDFS块大小是64MB,那么一个5MB的文件和一个50MB的文件都需要1个块(64MB)(见这里 http://blog.cloudera.com/blog/2009/02/the-small-files-problem/ ,这里小文件和HDFS块)。但是,5MB作业运行速度会比50MB作业快。任务超时可以在代码(mapred.task.timeout)中针对这个类似问题的答案增加:如何解决Task attempt_201104251139_0295_r_000006_0未能报告600秒的状态。



增加任务尝试次数
配置Hadoop的默认尝试次数超过4次(请参阅Pradeep Gollakota的回答)。这是三者中最强悍的方法。 Hadoop会多次尝试这项工作,但是您可能会掩盖一个潜在的问题(小型服务器,大型数据块等)。


When I run my hadoop job I get the following error:

Request received to kill task 'attempt_201202230353_23186_r_000004_0' by user Task has been KILLED_UNCLEAN by the user

The logs appear to be clean. I run 28 reducers, and this doesnt happen for all the reducers. It happens for a selected few and the reducer starts again. I fail to understand this. Also other thing I have noticed is that for a small dataset, I rarely see this error!

解决方案

There are three things to try:

Setting a Counter
If Hadoop sees a counter for the job progressing then it won't kill it (see Arockiaraj Durairaj's answer.) This seems to be the most elegant as it could allow you more insight into long running jobs and were the hangups may be.

Longer Task Timeouts
Hadoop jobs timeout after 10 minutes by default. Changing the timeout is somewhat brute force, but could work. Imagine analyzing audio files that are generally 5MB files (songs), but you have a few 50MB files (entire album). Hadoop stores an individual file per block. So if your HDFS block size is 64MB then a 5MB file and a 50 MB file would both require 1 block (64MB) (see here http://blog.cloudera.com/blog/2009/02/the-small-files-problem/, and here Small files and HDFS blocks.) However, the 5MB job would run faster than the 50MB job. Task timeout can be increased in the code (mapred.task.timeout) for the job per the answers to this similar question: How to fix "Task attempt_201104251139_0295_r_000006_0 failed to report status for 600 seconds."

Increase Task Attempts
Configure Hadoop to make more than the 4 default attempts (see Pradeep Gollakota's answer). This is the most brute force method of the three. Hadoop will attempt the job more times, but you could be masking an underlying issue (small servers, large data blocks, etc).

这篇关于不寻常的Hadoop错误 - 任务会自行死亡的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆