如何查找Hadoop中的Map/Reduce任务占用的CPU时间 [英] How to find the CPU time taken by a Map/Reduce task in Hadoop

查看:232
本文介绍了如何查找Hadoop中的Map/Reduce任务占用的CPU时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写Hadoop调度程序.我的调度要求找到每个Map/Reduce任务占用的CPU时间.

I am writing a Hadoop scheduler. My scheduling requires finding the CPU time taken by each Map/Reduce task.

我知道:

  • TaskInProgress类维护execStartTime和execFinishTime值,这些值是进程启动和结束时的挂钟时间,但它们不能准确指示任务消耗的CPU时间.

  • The TaskInProgress class maintains the execStartTime and execFinishTime values which are wall-clock times when the process started and finished, but they do not accurately indicate the CPU time consumed by the task.

每个任务都在新的JVM中执行,我可以使用OperatingSystemMXBean.

Each task is executed in a new JVM, and I could use the OperatingSystemMXBean.getProcessCpuTime () method, but again the description of the method tells me: "Returns the CPU time used by the process on which the Java virtual machine is running in nanoseconds". I am not entirely clear if this is what I want.

推荐答案

仅出于后代目的,我通过更改src/mapred/org/apache/hadoop/mapred/TaskLog.java(Hadoop 0.20.203 )在第572行

Just for posterity, I solved this problem by making a change in src/mapred/org/apache/hadoop/mapred/TaskLog.java (Hadoop 0.20.203) on line 572

mergedCmd.append("exec setsid 'time' ");    // add 'time'

CPU时间将被写入:日志/用户日志/JOBID/TASKID/stderr.我还编写了一个脚本来获取累积的CPU时间: https://gist.github.com/1984365 在执行作业之前,您需要确定要执行以下操作:

The CPU time will be written to: logs/userlogs/JOBID/TASKID/stderr. I also wrote a script to reap the cumulative CPU time: https://gist.github.com/1984365 Before running the job, you need to make sure you do:

rm -rf logs/userlogs/*

这样脚本才能工作.

这篇关于如何查找Hadoop中的Map/Reduce任务占用的CPU时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆