为什么我们不能在Hadoop中计算作业执行时间? [英] Why can't we calculate job execution time in Hadoop?

查看:194
本文介绍了为什么我们不能在Hadoop中计算作业执行时间?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题与Straggler问题有关.总而言之,它是一种算法,我们可以了解它的复杂性并计算在一组恒定数据上执行时的运行时间.

My question is related to Straggler problem. In sort, it's an algorithm and we can know its complexity and calculate the running time when executed on a constant set of data.

为什么我们不能在Hadoop中获取作业执行时间?

Why can't we acquire job execution time in Hadoop ?

如果我们可以获取作业执行时间或任务执行时间,则可以快速了解散乱任务,而无需算法来知道哪个任务是散乱任务.

If we can acquire the job execution time or task execution time, we can know the straggler tasks quickly without needing algorithms to know which task is Straggler.

推荐答案

您不应在运行该作业之前估计该作业将花费多少时间. 运行mapreduce作业后,您可以估算所需的时间. Mapreduce始终取决于群集容量– RAM大小,CPU内核和网络带宽–以及为该任务设置的减速器数量.

You should not estimate how much time a job will take before running that job. After running your mapreduce job, you can take an estimation of the time taken. Mapreduce always depends on your cluster capacity – RAM size, CPU Cores and network band width – and how many Reducers you set for the task.

您只能基于RAM大小除以输入拆分来进行假设.

You can only make assumptions based on your RAM size divided by the input split.

这篇关于为什么我们不能在Hadoop中计算作业执行时间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆