调整Hadoop参数 [英] Tuning Hadoop parameters

查看:98
本文介绍了调整Hadoop参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有一种方法可以微调Hadoop配置参数,而不必为每种可能的组合运行测试? 我目前正在8个节点的集群上工作,我想优化map reduce任务的性能以及spark性能(在hdfs之上运行).

解决方案

简短的回答是否".您需要进行测试并运行冒烟测试,以确定集群的最佳性能.因此,我将首先查看这些

链接:

讨论了一些会影响MapReduce作业的主题:

  • 配置HDFS块大小以获得最佳性能
  • 避免文件大小小于块大小
  • 调整DataNode JVM以获得最佳性能
  • 启用HDFS短路读取
  • 避免从过时的DataNode读取或写入

让您了解如何在YARN/TEZ中为每个节点群集设置4节点32核心128GB RAM:(来自 解决方案

The short answer is NO. You need to play around and run smoke tests to determine optimal performance for your cluster. So I would start by checking out these

Links:

Some topics discussed that will effect MapReduce jobs:

  • Configure HDFS block size for optimal performance
  • Avoid file sizes that are smaller than a block size
  • Tune DataNode JVM for optimal performance
  • Enable HDFS short circuit reads
  • Avoid reads or write from stale DataNodes

To give you an idea of how a 4 node 32 core 128GB RAM per node cluster is set up in YARN/TEZ: (From Hadoop multinode cluster too slow. How do I increase speed of data processing?)

For Tez: Divide RAM/CORES = Max TEZ Container size So in my case: 128/32 = 4GB

TEZ:


YARN:

I like to run max RAM I can spare per node with YARN, mine is a little higher than recommendations, but the recommended values cause crashes in TEZ/MR jobs so 76GB works better my case. You need to play with all these values!

这篇关于调整Hadoop参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆