尝试提交时,成功的任务会生成 mapreduce.counters.LimitExceededException [英] Successful task generates mapreduce.counters.LimitExceededException when trying to commit

查看:61
本文介绍了尝试提交时,成功的任务会生成 mapreduce.counters.LimitExceededException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个在 MapReduce 模式下运行的 Pig 脚本,它一直收到一个我无法修复的持久性错误.该脚本生成多个 MapReduce 应用程序;运行几个小时后,其中一个应用程序注册为 SUCCEEDED,但返回以下诊断消息:

I have a Pig script running in MapReduce mode that's been receiving a persistent error which I've been unable to fix. The script spawns multiple MapReduce applications; after running for several hours one of the applications registers as SUCCEEDED but returns the following diagnostic message:

我们在成功提交后崩溃了.正在恢复.

We crashed after successfully committing. Recovering.

导致失败的步骤是尝试对大约 100GB 的数据集执行 RANK,分为来自先前脚本的大约 1000 个 mapreduce 输出文件.但是对于其他尝试执行大型 HASH_JOIN 操作的脚本,我也收到了同样的错误.

The step that causes the failure is trying to perform a RANK over a dataset that's around 100GB, split across roughly 1000 mapreduce output files from a previous script. But I've also received the same error for other scripts trying to do a large HASH_JOIN operation.

深入研究日志,我发现以下内容,这似乎也表明作业成功但随后收到错误消息:

Digging into the logs, I find the following, which also seems to indicate that the job succeeded but then received an error winding down:

INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1523471594178_0475_m_001006_0 TaskAttempt Transitioned from COMMIT_PENDING to SUCCESS_CONTAINER_CLEANUP
INFO [ContainerLauncher #6] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_e15_1523471594178_0475_01_001013 taskAttempt attempt_1523471594178_0475_m_001006_0
INFO [ContainerLauncher #6] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1523471594178_0475_m_001006_0
INFO [ContainerLauncher #6] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : my.server.name:45454
INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1523471594178_0475_m_001006_0 TaskAttempt Transitioned from SUCCESS_CONTAINER_CLEANUP to SUCCEEDED
INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1523471594178_0475_m_001006_0
INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1523471594178_0475_m_001006 Task Transitioned from RUNNING to SUCCEEDED
INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 1011
INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1523471594178_0475Job Transitioned from RUNNING to COMMITTING
INFO [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_COMMIT
INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:2 AssignedReds:0 CompletedMaps:1011 CompletedReds:0 ContAlloc:1011 ContRel:0 HostLocal:1010 RackLocal:1
INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e15_1523471594178_0475_01_001014
INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e15_1523471594178_0475_01_001013
INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:1011 CompletedReds:0 ContAlloc:1011 ContRel:0 HostLocal:1010 RackLocal:1
INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1523471594178_0475_m_001007_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143. 
INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1523471594178_0475_m_001006_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143. 
FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 121 max=120
at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:101)
at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:108)
at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:78)
at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:95)
at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:106)
at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:203)
at org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:348)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1766)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.mayBeConstructFinalFullCounters(JobImpl.java:1752)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1733)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:1092)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2064)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2060)
at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:999)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:139)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1385)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1381)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:745)
INFO [AsyncDispatcher ShutDown handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.

我尝试了几种解决 mapreduce.counters.LimitExceededException 的方法.我修改了 Ambari 中的 MapReduce 配置,将 mapreduce.job.counters.max 设置为 20000(只是为了测试此问题的解决方案,而不是打算将其留在那里).我还尝试使用 set mapreduce.job.counters.max 10000; 行启动我的 Pig 脚本,以覆盖最大计数器.这两种变化似乎都没有任何影响;错误仍然显示限制为 120.

I've tried several methods of resolving the mapreduce.counters.LimitExceededException. I've modified the MapReduce configs in Ambari to set mapreduce.job.counters.max to 20000 (just in an effort to test a resolution to this issue, not with the intent of leaving it there). I also tried starting my Pig script with the line set mapreduce.job.counters.max 10000; in an effort to override the max counters. Neither change appears to have any impact; the error still displays a limit of 120.

我很困惑为什么更改最大计数器配置似乎没有影响.是否有一些相关的配置我可能会遗漏?或者此错误消息可能不准确,还是表示其他问题的症状?

I'm confused why changing the max counters configuration doesn't seem to be having an impact. Is there some related configuration I could be missing? Or is this error message possibly inaccurate, or a symptom that signifies a different issue?

更新:我发现了许多似乎与此问题有关的 Apache MapReduce Jira 票证;它似乎是一个现有的错误.我已经转而在 Tez 上运行我的工作,这消除了这个问题,但我在 Tez 上遇到了重大的性能问题,所以我仍然希望有人能在 MR 引擎上解决这个问题.

UPDATE: I've found a number of Apache MapReduce Jira tickets that seem to be related to this issue; it seems like it's an existing bug. I've switched to running my jobs on Tez, which eliminates the issue, but I've experienced major performance problems on Tez so I'm still hoping someone has a workaround for this on the MR engine.

推荐答案

这个 设置在 mapred-site.xml 这里是另一个 类似 问题 ,

This <property> is setting in mapred-site.xml here is another similar question ,

给你其他方法试试,不知道行不行

give you other ways to try ,but I don't know whether it will works

创建 job-local.xml 并设置 ,使用con.get("mapreduce.job.counters.limit")` 在您的项目中.或 con.set("mapreduce.job.counters.limit", "200");

create job-local.xml and set the <property> ,use con.get("mapreduce.job.counters.limit")` in your project . or to con.set("mapreduce.job.counters.limit", "200");

如果您更改了 mapred-site.xml 中的设置,请检查它是否有效.

if you have changed your setting in your mapred-site.xml please check it works.

这篇关于尝试提交时,成功的任务会生成 mapreduce.counters.LimitExceededException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆