使用JobControl测量Hadoop作业时间 [英] Measure Hadoop job time using JobControl

查看:155
本文介绍了使用JobControl测量Hadoop作业时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我曾用以下方式启动Hadoop作业:

  long start = new Date()。getTime(); 
布尔状态= job.waitForCompletion(true);
long end = new Date()。getTime();

通过这种方式,我可以测量作业在我的代码中直接结束所花费的时间。 p>

现在我必须使用JobControl来表达我的作业之间的依赖关系:

  JobControl jobControl = new JobControl(MyJob); 
jobControl.addJob(job1);
jobControl.addJob(job2);
job3.addDependingJob(job2);
jobControl.addJob(job3);

jobControl.run();

然而,一旦jobControl.run()被执行,代码永远不会进一步,所以我不能包含代码在jobControl.getState()上轮询完成作业。

$ b

解决方案

JobControl没有很好的功能可以让你获取这些信息。您有一些(可能是痛苦的)选项可供尝试:


  • 开始 JobControl.run()在单独的线程中,并在主线程中轮询 JobControl.getXXXJobs()方法以跟踪作业何时更改状态

  • 使用作业结束通知URL钩子,但这将要求您在客户端启动一个服务器来接收通知事件,然后尝试从作业结束时开始向后工作

  • 扩展 JobControl jobcontrol.Job 对象以跟踪作业何时更改状态并添加查询开始/结束时间的方法


I used to launch my Hadoop job with the following

long start = new Date().getTime();
boolean status = job.waitForCompletion(true);            
long end = new Date().getTime();

This way I could measure the time taken by the job once it ends directly in my code.

Now I have to use the JobControl in order to express dependencies between my jobs:

JobControl jobControl = new JobControl("MyJob");
jobControl.addJob(job1);
jobControl.addJob(job2);
job3.addDependingJob(job2);
jobControl.addJob(job3);

jobControl.run();

However once jobControl.run() has been executed, the code never goes further so I cannot include code to poll on the jobControl.getState() for the completion of the job.

How can I measure the time taken by a job using JobControl?

解决方案

JobControl has no nice functionality to allow you to hook and get this information. You have some (potentially painful) options to try:

  • Start JobControl.run() in a separate thread, and in your main thread, poll the JobControl.getXXXJobs() methods to track when jobs change state
  • Look into using the Job End Notification URL hook, but this will require you to start a 'server' in your client to receive the notification events, and then try to work backwards from when a job ends
  • Extend the JobControl and jobcontrol.Job objects to track when a job changes state and add methods to query the start / end times

这篇关于使用JobControl测量Hadoop作业时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆