如何访问输出阶段的Mapper / Reducer计数器? [英] How can I access the Mapper/Reducer counters on the Output stage?

查看:87
本文介绍了如何访问输出阶段的Mapper / Reducer计数器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Mapper类中创建了一些计数器:
$ b

(使用appengine-mapreduce Java库v.0.5编写的示例)

  @Override 
public void map(Entity entity){
getContext()。incrementCounter(analyze);
if(isSpecial(entity)){
getContext()。incrementCounter(special);





$ b $ isSpecial 取决于实体的状态,与该问题无关,只返回 true false



当我完成处理整个东西时,我想在输出的 finish 方法中访问这些计数器class:

  @Override 
public总结(Collection< ;? extends OutputWriter< Entity>> writers){
//获取计数器并保存/返回摘要
int analyze = 0; // getCounter( 分析);
int special = 0; // getCounter( 特殊);
摘要摘要=新摘要(已分析,特殊);
save(summary);
返回汇总;
}

...但方法 getCounter 只能从 MapperContext 类,它只能从Mappers / Reducers getContext()方法访问。



<如何在输出阶段访问我的计数器?



附注:我无法将计数器值发送到输出类,因为整个Map / Reduce关于将一组实体转换为另一组(换言之:计数器不是Map / Reduce的主要目的)。这些计数器只是为了控制 - 这是有道理的,我在这里计算它们,而不是创建另一个进程来计算。



谢谢。

解决方案

今天在输出内部没有办法做到这一点。但请随时在此处申请:
https://code.google .com / p / appengine-mapreduce / issues / list


然而,你可以做的是链接一个作业,在你的map-reduce之后运行接收它的输出和计数器。这里有一个例子:
https://code.google.com/p/appengine-mapreduce/source/browse/trunk/java/example/src/com/google/appengine /demos/mapreduce/entitycount/ChainedMapReduceJob.java



在上面的例子中,它连续运行3个MapReduce作业。请注意,这些不必是MapReduce作业,您可以创建自己的类来扩展Job,并具有创建Summary对象的run方法。


I have some counters I created at my Mapper class:

(example written using the appengine-mapreduce Java library v.0.5)

@Override
public void map(Entity entity) {
    getContext().incrementCounter("analyzed");
    if (isSpecial(entity)){
        getContext().incrementCounter("special");
    }
}

(The method isSpecial just returns true or false depending on the state of the entity, not relevant to the question)

I want to access those counters when I finish processing the whole stuff, at the finish method of the Output class:

@Override
public Summary finish(Collection<? extends OutputWriter<Entity>> writers) {
    //get the counters and save/return the summary
    int analyzed = 0; //getCounter("analyzed");
    int special = 0; //getCounter("special");
    Summary summary = new Summary(analyzed, special);
    save(summary);
    return summary;
}

... but the method getCounter is only available from the MapperContext class, which is accessible only from Mappers/Reducers getContext() method.

How can I access my counters at the Output stage?

Side note: I can't send the counters values to my outputted class because the whole Map/Reduce is about transforming a set of Entities to another set (in other words: the counters are not the main purpose of the Map/Reduce). The counters are just for control - it makes sense I compute them here instead of creating another process just to make the counts.

Thanks.

解决方案

There is not a way to do this inside of output today. But feel free to request it here: https://code.google.com/p/appengine-mapreduce/issues/list

What you can do however is to chain a job to run after your map-reduce that will receive it's output and counters. There is an example of this here: https://code.google.com/p/appengine-mapreduce/source/browse/trunk/java/example/src/com/google/appengine/demos/mapreduce/entitycount/ChainedMapReduceJob.java

In the above example it is running 3 MapReduce jobs in a row. Note that these don't have to be MapReduce jobs, you can create your own class that extends Job and has a run method which creates your Summary object.

这篇关于如何访问输出阶段的Mapper / Reducer计数器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆