hadoop map中的错误处理减少了 [英] Error handling in hadoop map reduce

查看:97
本文介绍了hadoop map中的错误处理减少了的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据文档,有几种方法,如何在map reduce中执行错误处理。以下是几个:

Based on the documentation, there are a few ways, how the error handling is performed in map reduce. Below are the few:

a。每个故障记录使用枚举 - 增量的自定义计数器。

a. Custom counters using enum - increment for every failed record.

b。日志错误并稍后分析。

b. Log error and analyze later.

计数器列出失败记录的数量。但是为了获得失败记录的标识符(可能是其唯一的密钥),发生异常的细节,发生错误的节点 - 我们需要执行集中式日志分析,并且有许多节点正在运行。 Logstash是可用的。

Counters give the number of failed records. However to get the identifier of the failed record(may be its unique key), and details of the exception occurred, node on which the error occurred - we need to perform centralized log analysis and there are many nodes running. Logstash is on which is available.

除此之外,还有其他方法来处理错误情况,无需人工干预。任何工具,参考资料,最佳做法都是受欢迎的。

Apart from these, are there any other ways to handle the error scenarios, without manual intervention. Any tools, references, best practices are welcome.

我认为同样的技术适用于任何分布式应用程序,稍作改动。

I think the same technique applies to any distributed applications, with minor changes.

推荐答案

在处理错误时,几个问题需要解决:

Few questions to ask, when working with error handling:


  1. 如果数据验证中发生错误,则该作业将停止。大多数的大数据用例可能会留下几个坏的记录。但是,如果您的用户需要所有记录都足够好,您应该采取该决定并转到以下步骤。

  1. Should the job be stopped if an error occurred in data validation. Most of the Big data use cases might be ok to leave few bad records. But if your usecase wants all the records to be good enough, you should take that decision and move to the below steps.

有时候,通过跳过不良记录或
并行并行运行,通过以下技术获取问题(错误),纠正和随着你的移动而修改。

Some times its better to let the job run by skipping the bad records or and in parallel, get the issues(errors) using below techniques, rectify and modify as you move along.

您希望发生错误,但只能限制时间。那么在整个作业停止之前可以抛出多少次异常如下

You want the errors to be occurred, but only limited times. Then how many times an exception can be thrown, before the entire job gets stopped is as below

对于Map任务: mapreduce.map.maxattempts 属性

对于reducer任务: mapreduce.reduce.maxattempts

For reducer tasks: mapreduce.reduce.maxattempts

默认值为4

处理格式不正确的数据。

Handling malformed data.

所以我们决定处理格式不正确的数据。然后定义条件或
哪个记录是坏的。您可以使用计数器,快速给您
数量的不良记录。

So we decided to handle the malformed data. Then define the condition or which the record is bad. You can use counters, to quickly give you the number of bad records.

在Mapper类中,

enum Temperature { OVER_10 }

内部地图方法

//解析记录

if(value > 10) {
    System.err.println("Temperature over 100 degrees for input: " + value);
    context.setStatus("Detected possibly corrupt record: see logs.");
    context.getCounter(Temperature.OVER_10).increment(1);      
}

使用上述方法,所有记录都被处理,计数器基于对坏记录。您可以在作业结束后,作业统计信息或通过Web UI或shell命令查看计数器值。

With the above method, all records get processed, and the counters get added based on the bad records. You can see the counter value, at the end of the job, after job statistics or through web UI or from shell command.

$mapred job -counter <job_id> '${fully_qualified_class_name}' ${enum_name}
$mapred job -counter job_1444655904448_17959 'com.YourMapper$Temperature' OVER_10

一旦你知道问题的影响,即不良记录的数量,我们需要知道为什么是坏的。为此,我们需要转到日志
并搜索错误消息。

Once you know the impact of the problem i.e number of bad records, we need to know "why is it bad". For this, we need to go to the logs and search for the error messages.

纱线提供日志聚合,并将所有日志组合为作业ID和商店在hdfs。可以使用

Yarn provide log aggregation and combines all the logs for a job id and stores in hdfs. It can be get using

yarn logs -applicationId <application ID>


这篇关于hadoop map中的错误处理减少了的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆