Hadoop映射器上下文对象 [英] Hadoop Mapper Context object

查看:117
本文介绍了Hadoop映射器上下文对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Hadoop框架调用的mapper或reducer类的 run()方法如何?该框架调用 run()方法,但它需要一个上下文对象,因此Hadoop如何传递该对象?什么信息存在于该对象中?

解决方案

run()方法将使用Java Run Time Polymorphism(即方法重写)。正如您可以在下面的链接中看到#569行,扩展映射器/缩减器将使用Java Reflection API进行实例化。 MapTask类从Job配置对象中获取扩展映射器/缩减器的名称,客户端程序将被配置为使用 job.setMapperClass()



以下是取自 Hadoop Source MapTask.java

  mapperContext = contextConstructor.newInstance(mapper,job,getTaskID(),
input,output,committer,
reporter,split);

input.initialize(split,mapperContext);
mapper.run(mapperContext);
input.close();`

#621行是运行时间的一个例子多态性。在这一行上,MapTask以'Mapper Context'为参数调用配置的映射器的run()方法。如果run()没有扩展,它会调用 org.apache.hadoop.mapreduce.Mapper 上的run()方法,该方法再次调用配置的map()方法

在上述链接的#616行中,MapTask创建了上下文对象,其中包含@harpun提及的所有作业配置等详细信息,然后通过到行#621上的run()方法。



上面的解释对于reduce任务以及相应的ReduceTask类是主要入口类都有效。


How is the run() method of mapper or reducer class called by the Hadoop framework? The framework is calling the run() method, but it requires one context object so how is Hadoop passing that object? What information resides in that object?

解决方案

The run() method will be called using the Java Run Time Polymorphism (i.e method overriding). As you can see the line# 569 on the link below, extended mapper/reducer will get instantiated using the Java Reflection APIs. The MapTask class gets the name of extended mapper/reducer from the Job configuration object which the client program would have been configured extended mapper/reducer class using job.setMapperClass()

The following is the code taken from the Hadoop Source MapTask.java

mapperContext = contextConstructor.newInstance(mapper, job, getTaskID(),
                                                  input, output, committer,
                                                  reporter, split);

   input.initialize(split, mapperContext);
   mapper.run(mapperContext);
   input.close();` 

The line# 621 is an example of run time polymorphism. On this line, the MapTask calls the run() method of configured mapper with 'Mapper Context' as parameter. If the run() is not extended, it calls the run() method on the org.apache.hadoop.mapreduce.Mapper which again calls the map() method on configured mapper.

On the line# 616 of the above link, MapTask creates the context object with all the details of job configuration, etc. as mentioned by @harpun and then passes onto the run() method on line # 621.

The above explanation holds good for reduce task as well with appropriate ReduceTask class being the main entry class.

这篇关于Hadoop映射器上下文对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆