Hadoop任务进度 [英] Hadoop Task progress
问题描述
我需要计算Hadoop集群中所有节点上运行的每个映射任务的进度。我正在考虑将处理过的数据的大小除以整个输入数据的大小,但我不确定如何得到这些信息。
I need to calculate the progress of each map task running on all nodes in a Hadoop cluster. I was thinking of dividing the size of the processed data by the size of the whole input data, but I am not sure how to get this information for a task.
我看到 TaskStatus
类有一个方法 getProgress()
,但没有任何说明。它提供了我需要的值吗?
I see that TaskStatus
class has a method getProgress()
, but there is no description for it. Does it provide the value that I need?
推荐答案
对于地图任务,是 getProgress()
返回映射器在输入文件中的进展程度。为了减少任务,计算并不简单。 这篇文章有一个很好的解释。
For a map task, yes getProgress()
returns how far the mapper has progressed through the input file. For reduce tasks, the calculation is less straightforward. This article has a pretty good explanation.
这篇关于Hadoop任务进度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!