如何修复“任务尝试_201104251139_0295_r_000006_0 未能报告状态 600 秒." [英] How to fix "Task attempt_201104251139_0295_r_000006_0 failed to report status for 600 seconds."
问题描述
我编写了一个 mapreduce 作业来从数据集中提取一些信息.数据集是用户对电影的评分.用户数量约为 25 万,电影数量约为 30 万.map的输出是
.在 reducer 中,我将处理这些对.
I wrote a mapreduce job to extract some info from a dataset. The dataset is users' rating about movies. The number of users is about 250K and the number of movies is about 300k. The output of map is <user, <movie, rating>*> and <movie,<user,rating>*>
. In the reducer, I will process these pairs.
但是当我运行作业时,映射器按预期完成,但减速器总是抱怨
But when I run the job, the mapper completes as expected, but reducer always complain that
Task attempt_* failed to report status for 600 seconds.
我知道这是由于更新状态失败,所以我在我的代码中添加了对 context.progress()
的调用,如下所示:
I know this is due to failed to update status, so I added a call to context.progress()
in my code like this:
int count = 0;
while (values.hasNext()) {
if (count++ % 100 == 0) {
context.progress();
}
/*other code here*/
}
不幸的是,这无济于事.仍然有很多 reduce 任务失败了.
Unfortunately, this does not help. Still many reduce tasks failed.
这是日志:
Task attempt_201104251139_0295_r_000014_1 failed to report status for 600 seconds. Killing!
11/05/03 10:09:09 INFO mapred.JobClient: Task Id : attempt_201104251139_0295_r_000012_1, Status : FAILED
Task attempt_201104251139_0295_r_000012_1 failed to report status for 600 seconds. Killing!
11/05/03 10:09:09 INFO mapred.JobClient: Task Id : attempt_201104251139_0295_r_000006_1, Status : FAILED
Task attempt_201104251139_0295_r_000006_1 failed to report status for 600 seconds. Killing!
顺便说一句,错误发生在reduce to copy阶段,日志说:
BTW, the error happened in reduce to copy phase, the log says:
reduce > copy (28 of 31 at 26.69 MB/s) > :Lost task tracker: tracker_hadoop-56:localhost/127.0.0.1:34385
感谢您的帮助.
推荐答案
最简单的方法是设置这个配置参数:
The easiest way will be to set this configuration parameter:
<property>
<name>mapred.task.timeout</name>
<value>1800000</value> <!-- 30 minutes -->
</property>
在 mapred-site.xml
这篇关于如何修复“任务尝试_201104251139_0295_r_000006_0 未能报告状态 600 秒."的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!