尝试提交时成功的任务会生成mapreduce.counters.LimitExceededException [英] Successful task generates mapreduce.counters.LimitExceededException when trying to commit

查看:288
本文介绍了尝试提交时成功的任务会生成mapreduce.counters.LimitExceededException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个在MapReduce模式下运行的Pig脚本,它一直在接收一个我一直无法修复的持久性错误。该脚本生成多个MapReduce应用程序;运行几个小时后,其中一个应用程序注册为SUCCEEDED,但返回以下诊断消息:


成功提交后,我们崩溃了。恢复。

导致失败的步骤是尝试对大约100GB的数据集执行RANK,将大约1000个mapreduce输出来自以前脚本的文件。但我也收到了其他脚本尝试执行大型HASH_JOIN操作的相同错误。



挖掘日志时,我发现以下内容,这似乎也表示工作成功,但收到错误清单:

  INFO [AsyncDispatcher事件处理程序] org.apache.hadoop.mapreduce。 v2.app.job.impl.TaskAttemptImpl:attempt_1523471594178_0475_m_001006_0 TaskAttempt从COMMIT_PENDING转换为SUCCESS_CONTAINER_CLEANUP 
INFO [ContainerLauncher#6] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl:处理事件EventType:CONTAINER_REMOTE_CLEANUP for container_e15_1523471594178_0475_01_001013 taskAttempt attempt_1523471594178_0475_m_001006_0
INFO [ContainerLauncher#6] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl:KILLING attempt_1523471594178_0475_m_001006_0
INFO [ContainerLauncher#6] org.apache.hadoop.yarn。 client.api.impl.ContainerManagementProtoc olProxy:打开代理服务器:my.server.name:45454
INFO [AsyncDispatcher事件处理程序] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:attempt_1523471594178_0475_m_001006_0 TaskAttempt从SUCCESS_CONTAINER_CLEANUP转换为成功
INFO [AsyncDispatcher事件处理程序] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:任务尝试成功try_1523471594178_0475_m_001006_0
INFO [AsyncDispatcher事件处理程序] org.apache.hadoop.mapreduce.v2。 app.job.impl.TaskImpl:task_1523471594178_0475_m_001006任务从RUNNING转换为SUCCEEDED
INFO [AsyncDispatcher事件处理程序] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:Num completed任务:1011
INFO [AsyncDispatcher事件处理程序] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:job_1523471594178_0475Job从RUNNING转换为COMMITTING
INFO [CommitterEvent处理器#1] org.apache.hadoop.mapreduce .v2.app.commit.CommitterEventHandler:处理事件E ventType:JOB_COMMIT
INFO [RMCommunator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:调度前:PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:2 AssignedReds:0 CompletedMaps:1011 CompletedReds :0 ContAlloc:1011 ContRel:0 HostLocal:1010 RackLocal:1
INFO [RMCommunator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:接收完成的容器container_e15_1523471594178_0475_01_001014
INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:接收完成容器container_e15_1523471594178_0475_01_001013
INFO [RMCommunator分配器] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:调度后: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:1011 CompletedReds:0 ContAlloc:1011 ContRel:0 HostLocal:1010 RackLocal:1
INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce .v2.app.job.imp l.TaskAttemptImpl:来自attempt_1523471594178_0475_m_001007_0的诊断报告:由ApplicationMaster终止的容器。
容器在请求时死亡。退出代码是143
使用非零退出代码143退出的容器。
INFO [AsyncDispatcher事件处理程序] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:诊断报告来自attempt_1523471594178_0475_m_001006_0:由ApplicationMaster终止的容器。
容器在请求时死亡。退出代码是143
使用非零退出代码143退出的容器。
FATAL [AsyncDispatcher事件处理程序] org.apache.hadoop.yarn.event.AsyncDispatcher:调度程序线程中的错误
org .apache.hadoop.mapreduce.counters.LimitExceededException:计数器太多:121 max = 120
at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:101)
at org .apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:108)
at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:78)
at org .apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:95)
at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:106)
at org .apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:203)
at org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:348)
a t org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl中的org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1766)
。 mayBeConstructFinalFullCounters(JobImpl.java:1752)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1733)
at org.apache.hadoop。 mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:1092)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl $ CommitSucceededTransition.transition(JobImpl.java :2064)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl $ CommitSucceededTransition.transition(JobImpl.java:2060)
at org.apache.hadoop.yarn.state .StateMachineFactory $ SingleInternalArc.doTransition(StateMachineFactory.java:362)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn .state.StateMachineFactory.access $ 300(StateMachineFactory.java:46)
在org.apache.hadoop.yarn.sta te.StateMachineFactory $ InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:999)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:139)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster $ JobEventDispatcher.handle(MRAppMaster .java:1385)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster $ JobEventDispatcher.handle(MRAppMaster.java:1381)
at org.apache.hadoop.yarn.event.AsyncDispatcher .dispatch(AsyncDispatcher.java:184)
at org.apache.hadoop.yarn.event.AsyncDispatcher $ 1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread。 java:745)
INFO [AsyncDispatcher ShutDown处理程序] org.apache.hadoop.yarn.event.AsyncDispatcher:退出,bbye。

我尝试了几种解决mapreduce.counters.LimitExceededException的方法。我已经修改了Ambari中的MapReduce配置,将mapreduce.job.counters.max设置为20000(只是为了测试此问题的解决方案,而不是将其留在那里)。我也试着用 set mapreduce.job.counters.max 10000; 来启动我的Pig脚本,以覆盖最大计数器。这两项变化似乎都没有任何影响;错误仍然显示120的限制。



我很困惑为什么改变最大计数器配置似乎没有影响。是否有一些相关的配置可能会丢失?或者是这个错误消息可能不准确,或者是一个表示不同问题的症状?

更新:我发现了一些似乎与之相关的Apache MapReduce Jira门票这个问题;它似乎是一个现有的错误。我已经转向在Tez上运行我的工作,这消除了这个问题,但是我在Tez上遇到了主要的性能问题,所以我仍然希望有人在MR引擎上有一个解决方法。

解决方案

< property> mapred-site.xml中设置这里是另一个类似的 问题

给你其他尝试的方法,但我不知道它是否可行

创建 job-local.xml 并设置<属性> ,在项目中使用
con.get(mapreduce.job.counters.limit)。或con.set(mapreduce.job.counters.limit,200);



如果您在mapred-site.xml中更改了设置,请检查它的工作原理。

I have a Pig script running in MapReduce mode that's been receiving a persistent error which I've been unable to fix. The script spawns multiple MapReduce applications; after running for several hours one of the applications registers as SUCCEEDED but returns the following diagnostic message:

We crashed after successfully committing. Recovering.

The step that causes the failure is trying to perform a RANK over a dataset that's around 100GB, split across roughly 1000 mapreduce output files from a previous script. But I've also received the same error for other scripts trying to do a large HASH_JOIN operation.

Digging into the logs, I find the following, which also seems to indicate that the job succeeded but then received an error winding down:

INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1523471594178_0475_m_001006_0 TaskAttempt Transitioned from COMMIT_PENDING to SUCCESS_CONTAINER_CLEANUP
INFO [ContainerLauncher #6] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_e15_1523471594178_0475_01_001013 taskAttempt attempt_1523471594178_0475_m_001006_0
INFO [ContainerLauncher #6] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1523471594178_0475_m_001006_0
INFO [ContainerLauncher #6] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : my.server.name:45454
INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1523471594178_0475_m_001006_0 TaskAttempt Transitioned from SUCCESS_CONTAINER_CLEANUP to SUCCEEDED
INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1523471594178_0475_m_001006_0
INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1523471594178_0475_m_001006 Task Transitioned from RUNNING to SUCCEEDED
INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 1011
INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1523471594178_0475Job Transitioned from RUNNING to COMMITTING
INFO [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_COMMIT
INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:2 AssignedReds:0 CompletedMaps:1011 CompletedReds:0 ContAlloc:1011 ContRel:0 HostLocal:1010 RackLocal:1
INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e15_1523471594178_0475_01_001014
INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e15_1523471594178_0475_01_001013
INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:1011 CompletedReds:0 ContAlloc:1011 ContRel:0 HostLocal:1010 RackLocal:1
INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1523471594178_0475_m_001007_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143. 
INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1523471594178_0475_m_001006_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143. 
FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 121 max=120
at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:101)
at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:108)
at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:78)
at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:95)
at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:106)
at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:203)
at org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:348)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1766)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.mayBeConstructFinalFullCounters(JobImpl.java:1752)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1733)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:1092)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2064)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2060)
at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:999)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:139)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1385)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1381)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:745)
INFO [AsyncDispatcher ShutDown handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.

I've tried several methods of resolving the mapreduce.counters.LimitExceededException. I've modified the MapReduce configs in Ambari to set mapreduce.job.counters.max to 20000 (just in an effort to test a resolution to this issue, not with the intent of leaving it there). I also tried starting my Pig script with the line set mapreduce.job.counters.max 10000; in an effort to override the max counters. Neither change appears to have any impact; the error still displays a limit of 120.

I'm confused why changing the max counters configuration doesn't seem to be having an impact. Is there some related configuration I could be missing? Or is this error message possibly inaccurate, or a symptom that signifies a different issue?

UPDATE: I've found a number of Apache MapReduce Jira tickets that seem to be related to this issue; it seems like it's an existing bug. I've switched to running my jobs on Tez, which eliminates the issue, but I've experienced major performance problems on Tez so I'm still hoping someone has a workaround for this on the MR engine.

解决方案

This <property> is setting in mapred-site.xml here is another similar question ,

give you other ways to try ,but I don't know whether it will works

create job-local.xml and set the <property> ,use con.get("mapreduce.job.counters.limit")` in your project . or to con.set("mapreduce.job.counters.limit", "200");

if you have changed your setting in your mapred-site.xml please check it works.

这篇关于尝试提交时成功的任务会生成mapreduce.counters.LimitExceededException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆