有没有办法从 MR 作业中的 reduce 任务中访问成功的 map 任务数量? [英] Is there a way to access number of successful map tasks from a reduce task in an MR job?

查看:17
本文介绍了有没有办法从 MR 作业中的 reduce 任务中访问成功的 map 任务数量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的 Hadoop reducers 中,我需要知道当前作业中执行了多少成功的地图任务.我想出了以下方法,据我所知这是行不通的.

In my Hadoop reducers, I need to know how many successful map tasks were executed in the current job. I've come up with the following, which as far as I can tell does NOT work.

    Counter totalMapsCounter = 
        context.getCounter(JobInProgress.Counter.TOTAL_LAUNCHED_MAPS);
    Counter failedMapsCounter = 
        context.getCounter(JobInProgress.Counter.NUM_FAILED_MAPS);
    long nSuccessfulMaps = totalMapsCounter.getValue() - 
                           failedMapsCounter.getValue();

或者,如果有一种好方法可以检索(再次,从我的 reducers 中)输入拆分的总数(不是文件数,也不是一个文件的拆分,而是总数为工作分开),这可能也行得通.(假设我的工作正常完成,那应该是相同的数字,对吧?)

Alternatively, if there's a good way that I could retrieve (again, from within my reducers) the total number of input splits (not number of files, and not splits for one file, but total splits for the job), that would probably also work. (Assuming my job completes normally, that should be the same number, right?)

推荐答案

使用 Job 或 JobConf 检索地图中的计数器和减少任务似乎不是一个好习惯.这是一个 替代方法用于将摘要详细信息从映射器传递到化简器.这种方法需要一些努力来编码,但是是可行的.如果该功能是 Hadoop 的一部分并且不需要手动编码,那就太好了.我已请求将此功能放入 Hadoop 并等待响应.

Looks like it's not a good practice to retrieve the counters in the map and reduce tasks using Job or JobConf. Here is an alternate approach for passing the summary details from the mapper to the reducer. This approach requires some effort to code, but is doable. It would have been nice if the feature had been part of Hadoop and not required to hand code it. I have requested to put this feature into Hadoop and waiting for the response.

JobCounter.TOTAL_LAUNCHED_MAPS 是在具有旧 MR API 的 Reducer 类中使用以下代码检索到的.

JobCounter.TOTAL_LAUNCHED_MAPS was retrieved using the below code in the Reducer class with the old MR API.

private String jobID;
private long launchedMaps;

public void configure(JobConf jobConf) {

    try {
        jobID = jobConf.get("mapred.job.id");

        JobClient jobClient = new JobClient(jobConf);

        RunningJob job = jobClient.getJob(JobID.forName(jobID));

        if (job == null) {
            System.out.println("No job with ID found " + jobID);
        } else {
            Counters counters = job.getCounters();
            launchedMaps = counters.getCounter(JobCounter.TOTAL_LAUNCHED_MAPS);
        }

    } catch (Exception e) {
        e.printStackTrace();
    }
}

使用新的 API,Reducer 实现可以通过 JobContext#getConfiguration().上面的代码可以在Reducer#setup().

With the new API, Reducer implementations can access the Configuration for the job via the JobContext#getConfiguration(). The above code can be implemented in Reducer#setup().

旧 MR API 中的 Reducer#configure() 和新 MR API 中的 Reducer#setup(),在 Reducer.reduce() 被调用.

Reducer#configure() in the old MR API and Reducer#setup() in the new MR API, are invoked once for each reduce task before the Reducer.reduce() is invoked.

顺便说一句,计数器可以从其他 JVM 中获取,也可以从启动该工作的那个 JVM 中获取.

BTW, the counters can be got from other JVM also beside the one which kicked the job.

JobInProgress 定义如下,因此不应使用.此 API 仅适用于有限的项目,接口可能会发生变化.

@InterfaceAudience.LimitedPrivate({"MapReduce"})
@InterfaceStability.Unstable

并非如此,JobCounter.TOTAL_LAUNCHED_MAPS 还包括由于推测执行而启动的地图任务

Not that, JobCounter.TOTAL_LAUNCHED_MAPS also includes map tasks launched due to speculative execution also

这篇关于有没有办法从 MR 作业中的 reduce 任务中访问成功的 map 任务数量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆