调整Hadoop参数 [英] Tuning Hadoop parameters
问题描述
是否有一种方法可以微调Hadoop配置参数,而不必为每种可能的组合运行测试? 我目前正在8个节点的集群上工作,我想优化map reduce任务的性能以及spark性能(在hdfs之上运行).
简短的回答是否".您需要进行测试并运行冒烟测试,以确定集群的最佳性能.因此,我将首先查看这些
链接:
- https://community.hortonworks .com/articles/103176/hdfs-settings-for-better-hadoop-performance.html
- http://crazyadmins.com/调整hadoop群集以获取最大性能部分-1/
- http://crazyadmins.com/调整hadoop群集以获取最大性能部分2/
讨论了一些会影响MapReduce作业的主题:
- 配置HDFS块大小以获得最佳性能
- 避免文件大小小于块大小
- 调整DataNode JVM以获得最佳性能
- 启用HDFS短路读取
- 避免从过时的DataNode读取或写入
让您了解如何在YARN/TEZ中为每个节点群集设置4节点32核心128GB RAM:(来自 解决方案
The short answer is NO. You need to play around and run smoke tests to determine optimal performance for your cluster. So I would start by checking out these
Links:
- https://community.hortonworks.com/articles/103176/hdfs-settings-for-better-hadoop-performance.html
- http://crazyadmins.com/tune-hadoop-cluster-to-get-maximum-performance-part-1/
- http://crazyadmins.com/tune-hadoop-cluster-to-get-maximum-performance-part-2/
Some topics discussed that will effect MapReduce jobs:
- Configure HDFS block size for optimal performance
- Avoid file sizes that are smaller than a block size
- Tune DataNode JVM for optimal performance
- Enable HDFS short circuit reads
- Avoid reads or write from stale DataNodes
To give you an idea of how a 4 node 32 core 128GB RAM per node cluster is set up in YARN/TEZ: (From Hadoop multinode cluster too slow. How do I increase speed of data processing?)
For Tez: Divide RAM/CORES = Max TEZ Container size So in my case: 128/32 = 4GB
TEZ:
YARN:
I like to run max RAM I can spare per node with YARN, mine is a little higher than recommendations, but the recommended values cause crashes in TEZ/MR jobs so 76GB works better my case. You need to play with all these values!
这篇关于调整Hadoop参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!