用于登录MapReduce作业的标准做法 [英] Standard practices for logging in MapReduce jobs

查看:140
本文介绍了用于登录MapReduce作业的标准做法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找到登录MapReduce作业的最佳方法。我在我的其他Java应用程序中使用slf4j和log4j appender,但由于MapReduce作业在整个集群中以分布式方式运行,我不知道应该在哪里设置日志文件位置,因为它是一个访问受限的共享集群权限。

I'm trying to find the best approach for logging in MapReduce jobs. I'm using slf4j with log4j appender as in my other Java applications, but since MapReduce job runs in a distributed manner across the cluster I don't know where should I set the log file location, since it is a shared cluster with limited access privileges.

是否有登录MapReduce作业的标准做法,因此您可以轻松地在作业完成后查看整个群集中的日志吗?

Is there any standard practices for logging in MapReduce jobs, so you can easily be able to look at the logs across the cluster after the job completes?

推荐答案

您可以使用log4j,它是hadoop使用的默认日志框架。因此,从您的MapReduce应用程序中,您可以执行如下操作:

You could use log4j which is the default logging framework that hadoop uses. So, from your MapReduce application you could do something like this:

import org.apache.log4j.Logger;
// other imports omitted

public class SampleMapper extends Mapper<LongWritable, Text, Text, Text> {
    private Logger logger = Logger.getLogger(SampleMapper.class);

    @Override
    protected void setup(Context context) {
        logger.info("Initializing NoSQL Connection.")
        try {
            // logic for connecting to NoSQL - ommitted
        } catch (Exception ex) {
            logger.error(ex.getMessage());
        }
    }

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        // mapper code ommitted
    }
}        

此示例代码将使用log4j记录器记录事件到继承的Mapper记录器。所有的日志事件都将被记录到它们各自的任务日志中。您可以从JobTracker(MRv1)/ ResourceManager(MRv2)网页访问任务日志。

This sample code will user log4j logger to log events to the inherited Mapper logger. All the log events will be logged to their respective task log's. You could visit the task logs from either JobTracker(MRv1)/ResourceManager(MRv2) webpage.

如果您使用纱线,则可以访问应用程序日志从命令行使用以下命令:

If you are using yarn you could access the application logs from command line using the following command:

yarn logs -applicationId <application_id>

如果您使用 mapreduce v1 ,则没有单点从命令行访问;因此您必须登录到每个TaskTracker并查看中指定的 / var / log / hadoop / userlogs / attempt_< job_id> / syslog > $ {hadoop.log.dir} / userlogs 包含log4j输出。

While if you are using mapreduce v1, there is no single point of access from command line; hence you have to log into each TaskTracker and look in the configured path generally /var/log/hadoop/userlogs/attempt_<job_id>/syslog specified in ${hadoop.log.dir}/userlogs contains log4j output.

这篇关于用于登录MapReduce作业的标准做法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆