如何查找Hadoop中的Map/Reduce任务占用的CPU时间 [英] How to find the CPU time taken by a Map/Reduce task in Hadoop
问题描述
我正在编写Hadoop调度程序.我的调度要求找到每个Map/Reduce任务占用的CPU时间.
I am writing a Hadoop scheduler. My scheduling requires finding the CPU time taken by each Map/Reduce task.
我知道:
-
TaskInProgress类维护execStartTime和execFinishTime值,这些值是进程启动和结束时的挂钟时间,但它们不能准确指示任务消耗的CPU时间.
The TaskInProgress class maintains the execStartTime and execFinishTime values which are wall-clock times when the process started and finished, but they do not accurately indicate the CPU time consumed by the task.
每个任务都在新的JVM中执行,我可以使用OperatingSystemMXBean.
Each task is executed in a new JVM, and I could use the OperatingSystemMXBean.getProcessCpuTime () method, but again the description of the method tells me: "Returns the CPU time used by the process on which the Java virtual machine is running in nanoseconds". I am not entirely clear if this is what I want.
推荐答案
仅出于后代目的,我通过更改src/mapred/org/apache/hadoop/mapred/TaskLog.java(Hadoop 0.20.203 )在第572行
Just for posterity, I solved this problem by making a change in src/mapred/org/apache/hadoop/mapred/TaskLog.java (Hadoop 0.20.203) on line 572
mergedCmd.append("exec setsid 'time' "); // add 'time'
CPU时间将被写入:日志/用户日志/JOBID/TASKID/stderr.我还编写了一个脚本来获取累积的CPU时间: https://gist.github.com/1984365 在执行作业之前,您需要确定要执行以下操作:
The CPU time will be written to: logs/userlogs/JOBID/TASKID/stderr. I also wrote a script to reap the cumulative CPU time: https://gist.github.com/1984365 Before running the job, you need to make sure you do:
rm -rf logs/userlogs/*
这样脚本才能工作.
这篇关于如何查找Hadoop中的Map/Reduce任务占用的CPU时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!