在 Hadoop mapreduce 作业中重用 JVM [英] reuse JVM in Hadoop mapreduce jobs

查看：40 发布时间：2021/12/15 19:06:36 performance hadoop jvm mapreduce

本文介绍了在 Hadoop mapreduce 作业中重用 JVM的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我知道我们可以设置属性mapred.job.reuse.jvm.num.tasks"来重用JVM.我的问题是:

I know we can set the property "mapred.job.reuse.jvm.num.tasks" to re-use JVM. My questions are:

(1)这里如何决定要设置的任务数，-1还是其他一些正整数?

(1) how to decide the number of tasks to be set here, -1 or some other positive integers?

(2) 在 mapreduce 作业中重用 JVM 并将此属性设置为值 -1 是否是个好主意?

(2) is it a good idea to already reuse JVMs and set this property to the value of -1 in mapreduce jobs?

非常感谢！

推荐答案

如果您有非常小的任务，并且肯定会在彼此之后运行，则将此属性设置为 -1 会很有用(意味着将重用衍生的 JVM无限次).因此，您只需生成(集群中可供您的作业使用的任务数)-JVM 而不是(任务数)-JVM.

If you have very small tasks that are definitely running after each other, it is useful to set this property to -1 (meaning that a spawned JVM will be reused unlimited times). So you just spawn (number of task in your cluster available to your job)-JVMs instead of (number of tasks)-JVMs.

这是一个巨大的性能提升.在长时间运行的作业中，与设置新 JVM 相比，运行时间的百分比非常低，因此不会给您带来巨大的性能提升.

This is a huge performance improvement. In long running jobs the percentage of the runtime in comparision to setup a new JVM is very low, so it doesn't give you a huge performance boost.

同样在长时间运行的任务中，最好重新创建任务进程，因为堆碎片等问题会降低您的性能.

Also in long running tasks it is good to recreate the task process, because of issues like heap fragmentation degrading your performance.

此外，如果您有一些中途运行的作业，您可以仅重用 2-3 个任务，这具有很好的权衡.

In addition, if you have some mid-time-running jobs, you could reuse just 2-3 of the tasks, having a good trade-off.

这篇关于在 Hadoop mapreduce 作业中重用 JVM的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在 Hadoop mapreduce 作业中重用 JVM [英] reuse JVM in Hadoop mapreduce jobs

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在 Hadoop mapreduce 作业中重用 JVM [英] reuse JVM in Hadoop mapreduce jobs

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭