hadoop2中JobClient.java和JobSubmitter.java有什么区别? [英] What is the difference between JobClient.java and JobSubmitter.java in hadoop2?

查看:249
本文介绍了hadoop2中JobClient.java和JobSubmitter.java有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

其中哪些用于提交作业以便在作业追踪器中执行。如果可以解释这两个类在不同的用例中是如何使用的,那将是非常好的。

解决方案

问题1 :JobClient

通过New API中的Job类完成Job 控制,而不是旧类
JobClient


$ b Job是Job的作业提交者视图。

它允许用户配置作业,提交作业,控制其执行并查询状态。 set方法只在作业提交之前工作,之后它们将抛出IllegalStateException。



通常,用户创建应用程序,通过Job描述作业的各个方面,然后提交作业并监控其进度。



问题 2: JobSubmitter

$ b $ Job 的 submit()方法创建一个内部的 JobSubmitter 实例并调用 submitJobInternal()就可以了。



一旦作业提交了作业,waitForCompletion()就会每秒轮询作业的进度并将进度报告给控制台。当作业成功完成时,显示作业计数器。否则,导致作业失败的错误记录到控制台。



作业提交过程由 JobSubmitter 执行以下操作:


  1. 向资源管理器请求一个新的应用程序ID,用于 MapReduce作业ID


  2. 检查作业的输出规格。例如,如果输出目录没有被指定,或者它已经存在,那么作业不会被提交,并且向 MapReduce 程序抛出错误。


  3. 计算作业的输入分割。如果无法计算 splits (例如,因为输入路径不存在),那么作业不会被提交,并且会向 MapReduce program。


  4. 复制运行作业所需的资源,包括作业JAR文件,
    配置文件和计算的输入拆分到以作业ID命名的目录中的共享文件系统。

  5. 作业JAR以高复制因子复制(由mapreduce.client.submit.file.replication属性控制,默认值为10),以便通过调用在资源管理器上提交应用程序()


  6. 定义指南foruth edition
    是理解概念的最好的书籍之一。从代码结束,你可以参考源代码grepcode:

    作业:API检查: waitForCompletion() = > submit() => jobClient.submitJobInternal



    JobClient submitJobInternal


    Which of these is used to submit job for execution in job tracker. It would be great if one can explain how both these classes are being used in different use cases.

    解决方案

    Question 1: JobClient

    Job control is done through the Job class in New API rather than the old class JobClient

    Job is job submitter's view of the Job.

    It allows the user to configure the job, submit it, control its execution, and query the state. The set methods only work until the job is submitted, afterwards they will throw an IllegalStateException.

    Normally the user creates the application, describes various facets of the job via Job and then submits the job and monitor its progress.

    Question 2: JobSubmitter

    The submit() method on Job creates an internal JobSubmitter instance and calls submitJobInternal() on it.

    Once the job is submitted the job,waitForCompletion() polls the job’s progress once per second and reports the progress to the console. When the job completes successfully, the job counters are displayed. Otherwise, the error that caused the job to fail is logged to the console.

    The job submission process implemented by JobSubmitter does the following:

    1. Asks the resource manager for a new application ID, used for the MapReduce job ID

    2. Checks the output specification of the job. For example, if the output directory has not been specified or it already exists, the job is not submitted and an error is thrown to the MapReduce program.

    3. Computes the input splits for the job. If the splits cannot be computed (because the input paths don’t exist, for example), the job is not submitted and an error is thrown to the MapReduce program.

    4. Copies the resources needed to run the job, including the job JAR file, the configuration file, and the computed input splits, to the shared filesystem in a directory named after the job ID .

    5. The job JAR is copied with a high replication factor (controlled by the mapreduce.client.submit.file.replication property, which defaults to 10) so that there are lots of copies across the cluster for the node managers to access when they run tasks for the job.

    6. submits the job by calling submitApplication() on the resource manager

    Hadoop : The defiinitive guide foruth edition is one of the best books to understand the concepts

    From code end, you can refer to source code from grepcode :

    Job : API to check : waitForCompletion() => submit() => jobClient.submitJobInternal

    JobClient : submitJobInternal

    这篇关于hadoop2中JobClient.java和JobSubmitter.java有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆