添加节点后hadoop和hbase重新平衡 [英] hadoop and hbase rebalancing after node additions

查看:673
本文介绍了添加节点后hadoop和hbase重新平衡的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于负载平衡器的基本问题。我刚刚完成添加新节点到我们的hadoop(2.3)群集,其中也有hbase v0.98。在hadoop和hbase上添加并使所有节点联机后,


  1. hbase受到hadoop rebalancer的影响如何?我需要明确尝试在hadoop重新平衡之后重新平衡hbase吗?


  2. 我的Hadoop集群完全由hbase占据。设置balancer_switch = true,它会自动重新平衡hbase和hadoop吗?

  3. 确保hadoop和hbase重新平衡并且工作正常的最佳方式是什么

  4. H
    (HDFS)平衡器将块从一个节点移动到另一个节点,以尝试使每个datanode具有相同数量的数据(在可配置阈值内)。这混乱了HBase的数据局部性,这意味着某个特定区域可能正在为不在其本地主机上的文件提供服务。 HBase的balance_switch平衡了集群,以便每个区域服务器拥有相同数量的区域(或接近)。 这与Hadoop(HDFS)平衡器是分开的。 如果您只运行HBase,我建议不要运行Hadoop(HDFS)平衡器,因为它会导致某些区域失去他们的数据局部性。这会导致对该区域的任何请求都必须通过网络传输到正在为其提供HFile的其中一个datanode。 HBase的数据局部性被恢复了。每当发生压缩时,所有块都会本地复制到服务该区域的区域服务器并进行合并。此时,该地区的数据位置得到恢复。有了这个,你真正需要做的就是添加新节点到集群。 Hbase将负责重新平衡地区,一旦这些地区紧凑的数据地区将恢复。

    I have a fundamental question about load balancer. I just finished adding new nodes to our hadoop(2.3) cluster which also has hbase v0.98. After the addition and having all nodes online in hadoop and hbase,

    1. How is hbase affected by hadoop rebalancer? Do I need to explicitly try to rebalance hbase after hadoop rebalance?

    2. My Hadoop cluster is entirely occupied by hbase. Setting balancer_switch=true, will it automatically rebalance hbase and hadoop?

    3. What is the best way to make sure that both hadoop and hbase are rebalanced and work fine too?

    解决方案

    1. The Hadoop (HDFS) balancer moves blocks around from one node to another to try to make it so each datanode has the same amount of data (within a configurable threshold). This messes up HBases's data locality, meaning that a particular region may be serving a file that is no longer on it's local host.

    2. HBase's balance_switch balances the cluster so that each regionserver hosts the same number of regions (or close to). This is separate from Hadoop's (HDFS) balancer.

    3. If you are running only HBase, I recommend not running Hadoop's (HDFS) balancer as it will cause certain regions to lose their data locality. This causes any request to that region to have to go over the network to one of the datanodes that is serving it's HFile.

    HBase's data locality is recovered though. Whenever compaction occurs, all the blocks are copied locally to the regionserver serving that region and merged. At that point, data locality is recovered for that region. With that, all you really need to do to add new nodes to the cluster is add them. Hbase will take care of rebalancing the regions, and once these regions compact data locality will be restored.

    这篇关于添加节点后hadoop和hbase重新平衡的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆