在 Hadoop mapreduce 作业中重用 JVM [英] reuse JVM in Hadoop mapreduce jobs

查看:40
本文介绍了在 Hadoop mapreduce 作业中重用 JVM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道我们可以设置属性mapred.job.reuse.jvm.num.tasks"来重用JVM.我的问题是:

I know we can set the property "mapred.job.reuse.jvm.num.tasks" to re-use JVM. My questions are:

(1)这里如何决定要设置的任务数,-1还是其他一些正整数?

(1) how to decide the number of tasks to be set here, -1 or some other positive integers?

(2) 在 mapreduce 作业中重用 JVM 并将此属性设置为值 -1 是否是个好主意?

(2) is it a good idea to already reuse JVMs and set this property to the value of -1 in mapreduce jobs?

非常感谢!

推荐答案

如果您有非常小的任务,并且肯定会在彼此之后运行,则将此属性设置为 -1 会很有用(意味着将重用衍生的 JVM无限次).因此,您只需生成(集群中可供您的作业使用的任务数)-JVM 而不是(任务数)-JVM.

If you have very small tasks that are definitely running after each other, it is useful to set this property to -1 (meaning that a spawned JVM will be reused unlimited times). So you just spawn (number of task in your cluster available to your job)-JVMs instead of (number of tasks)-JVMs.

这是一个巨大的性能提升.在长时间运行的作业中,与设置新 JVM 相比,运行时间的百分比非常低,因此不会给您带来巨大的性能提升.

This is a huge performance improvement. In long running jobs the percentage of the runtime in comparision to setup a new JVM is very low, so it doesn't give you a huge performance boost.

同样在长时间运行的任务中,最好重新创建任务进程,因为堆碎片等问题会降低您的性能.

Also in long running tasks it is good to recreate the task process, because of issues like heap fragmentation degrading your performance.

此外,如果您有一些中途运行的作业,您可以仅重用 2-3 个任务,这具有很好的权衡.

In addition, if you have some mid-time-running jobs, you could reuse just 2-3 of the tasks, having a good trade-off.

这篇关于在 Hadoop mapreduce 作业中重用 JVM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆