提交Hadoop作业 [英] Submitting a Hadoop job

查看:133
本文介绍了提交Hadoop作业的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要不断获得mappers和reducer的运行时间。
$ b

  JobClient jobclient =新的JobClient(conf); 
RunningJob runjob = jobclient.submitJob(conf);


TaskReport [] maps = jobclient.getMapTaskReports(runjob.getID());

long mapDuration = 0;
for(TaskReport rpt:maps){
mapDuration + = rpt.getFinishTime() - rpt.getStartTime();
}

然而,当我运行程序时,似乎没有提交作业,映射器从不启动。我怎样才能使用 JobClient.runJob(conf)并仍然能够获得运行时间?

解决方案

()方法立即返回控制给调用程序,而不等待hadoop Job启动,更不完整。如果你想等待,然后使用 waitForCompletion()方法,该方法仅在hadoop作业完成后返回。我认为你需要一些介于两者之间的内容,因为你希望在提交之后但在完成之前运行后续代码。



我建议你将后续代码放在循环中,直到作业完成(使用 isComplete()该测试的方法),并随着工作进展观察映射器和简化器。您可能还想在某个位置放置一个Thread.sleep(xxx)。



要回复您的评论,您希望...

  job.waitForCompletion(); 
TaskCompletionEvent event [] = job.getTaskCompletionEvents();
for(int i = 0; i< event.length(); i ++){
System.out.println(Task+ i +took+ event [i] .getTaskRunTime() +ms);
}


I need to constantly get the mappers' and reducers' running time. I have submitted the job as follows.

 JobClient jobclient = new JobClient(conf);
 RunningJob runjob = jobclient.submitJob(conf);          


 TaskReport [] maps = jobclient.getMapTaskReports(runjob.getID());

 long mapDuration = 0;
 for(TaskReport rpt: maps){
    mapDuration += rpt.getFinishTime() - rpt.getStartTime();
 }

However when I run the program, it seems like the job is not submitted and the mapper never starts. How can I use JobClient.runJob(conf) and still be able to get the running time?

解决方案

The submitJob() method returns control immediately to the calling program without waiting for the hadoop Job to start, much less complete. If you want to wait then use the waitForCompletion() method which returns only after the hadoop job has finished. I think you want something in between since you want to run subsequent code after the submit but before the complete.

I suggest you put your follow-on code in a loop that continues until the job is complete (Use the isComplete() method for that test) and observe the mappers and reducers as the job progresses. You probably want to put a Thread.sleep(xxx) in the loop somewhere, too.

To respond to your comment, you want to...

job.waitForCompletion();
TaskCompletionEvent event[] = job.getTaskCompletionEvents();
for (int i = 0; i < event.length(); i++) {
    System.out.println("Task "+i+" took "+event[i].getTaskRunTime()+" ms");
}    

这篇关于提交Hadoop作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆