HBase区域使用hbase.hregion.max.filesize自动分割 [英] HBase regions automatic splitting using hbase.hregion.max.filesize

查看:1438
本文介绍了HBase区域使用hbase.hregion.max.filesize自动分割的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用HBase的cloudera发行版(hbase-0.94.6-cdh4.5.0)和cloudera管理器来设置所有集群配置。

I为HBase设置了以下属性:

 < property> 
< name> hbase.hregion.max.filesize< / name>
<值> 10737418240< /值>
< source> hbase-default.xml< / source>
< / property>

10737418240 <=> 10G b
$ b因此,根据我读到的所有文档,数据应该积累到一个区域,直到区域大小达到10G。


但是,它似乎没有工作...
也许我想念一些东西......



这里是我的hbase的所有区域表和它们的大小:
$ b $ root $ hadoopmaster01:〜#hdfs dfs -du -h / hbase / my_table
719 / hbase / my_table / .tableinfo.0000000001
0 /hbase/my_table/.tmp
222.2 M / hbase / my_table / 08e225d0ae802ef805fff65c89a15de6
602.7 M / hbase / my_table / 0f3bb09af53ebdf5e538b50d7f08786e
735.1 M / hbase / MY_TABLE / 1152669b3ef439f08614e3785451c305
2.8 G / HBase的/ MY_TABLE / 1203fbc208fc93a702c67130047a1e4f
379.3 M / HBase的/ MY_TABLE / 1742b0e038ece763184829e25067f138
7.3 G / HBase的/ MY_TABLE / 194eae40d50554ce39c82dd8b2785d96
627.1 M / HBase的/ MY_TABLE / 28aa1df8140f4eb289db76a17c583028
27 4.6 M / hbase / my_table / 2f55b9760dbcaefca0e1064ce5da6f48
1.5 G / hbase / my_table / 392f6070132ec9505d7aaecdc1202418
1.5 G / hbase / my_table / 4396a8d8c5663de237574b967bf49b8a
1.6 G / hbase / my_table / 440964e857d9beee1c24104bd96b7d5c
1.5 G / HBase的/ MY_TABLE / 533369f47a365ab06f863d02c88f89e2
2.5 G / HBase的/ MY_TABLE / 6d86b7199c128ae891b84fd9b1ccfd6e
1.2 G / HBase的/ MY_TABLE / 6e5e6878028841c4d1f4c3b64d04698b
1.6 G / HBase的/ MY_TABLE / 7dc1c717de025f3c15aa087cda5f76d2
200.2 M / HBase的/ MY_TABLE / 8157d48f833bb3b708726c703874569d
118.0 M / HBase的/ MY_TABLE / 85fb1d24bf9d03d748f615d3907589f2
2.0 G / HBase的/ MY_TABLE / 94dd01c81c73dc35c02b6bd2c17d8d22
265.1 M / HBase的/ MY_TABLE / 990d5adb14b2d1c936bd4a9c726f8e03
335.0 M / HBase的/ MY_TABLE / a9b673c142346014e01d7cf579b0e58a
502.1 M / hbase / my_table / ae3b1f6f537826f1bdb31bfc89d8ff9a
763.3 M / hbase / my_table / b6039c539b6cca2826022f863ed76c7b
470.7 M / hbase / my_table / be091ead2a408df55999950dcff6e7bc
5.9 G / HBase的/ MY_TABLE / c176cf8c19cc0fffab2af63ee7d1ca45
512.0 M / HBase的/ MY_TABLE / cb622a8a55ba575549759514281d5841
1.9 G / HBase的/ MY_TABLE / d201d1630ffdf08e4114dfc691488372
787.9 M / HBase的/ MY_TABLE / d78b4f682bb8e666488b06d0fd00ef9b
862.8 M / hbase / my_table / edd72e02de2a90aab086acd296d7da2b
627.5 M / hbase / my_table / f13a251ff7154f522e47bd54f0d1f921
1.3 G / hbase / my_table / fde68ec48d68e7f61a0258b7f8898be4



正如你所看到的,有很多区域,其中任何一个都有接近10G的大小......



如果有人遇到过这样的问题或知道是否有其他配置设置,请帮助我!

Thx

解决方案

@mpiffaretti,你看到的是非常有效的。我在第一次自动分割后看到区域大小时也感到有些震惊。



在HBase 0.94+中,默认分割策略是 IncreasingToUpperBoundRegionSplitPolicy 。区域大小由以下算法决定。
$ b


拆分大小是此服务器上所有具有相同表的立方体的区域数量乘以区域刷新大小的两倍或最大区域拆分大小,以较小者为准。例如,如果刷新大小为128M,那么在两次刷新(256MB)之后,我们将分割这将使两个区域在其大小为2 ^ 3 * 128M * 2 = 2048M时分裂。如果其中一个区域分裂,则有三个区域,现在分割大小为3 ^ 3 * 128M * 2 = 6912M,依此类推,直到达到配置的最大文件大小,然后从那里开始,我们将使用该区域。

这是一个相当不错的策略,因为您开始在区域服务器上分散区域,而不必等到它们达到10GB的限制。



或者,您最好预分割您的表格,因为您要确保您充分发挥处理能力你的群集 - 如果你有一个单一的区域,所有的请求将转到区域分配给的区域服务器。预分割将控制权交给您手中的区域如何在行键空间上分割。


I'm using the cloudera distribution of HBase (hbase-0.94.6-cdh4.5.0) and the cloudera manager to set up all cluster's configurations.

I have set up the following property for HBase:

<property>
<name>hbase.hregion.max.filesize</name>
<value>10737418240</value>
<source>hbase-default.xml</source>
</property>

NB: 10737418240 <=> 10G

So, according to all documentation I read, data should be accumulated into a single region until the region size reached 10G.

But, it doesn't seem to work... Maybe I miss something...

Here is all regions of my hbase table and their size:

root@hadoopmaster01:~# hdfs dfs -du -h /hbase/my_table 719 /hbase/my_table/.tableinfo.0000000001 0 /hbase/my_table/.tmp 222.2 M /hbase/my_table/08e225d0ae802ef805fff65c89a15de6 602.7 M /hbase/my_table/0f3bb09af53ebdf5e538b50d7f08786e 735.1 M /hbase/my_table/1152669b3ef439f08614e3785451c305 2.8 G /hbase/my_table/1203fbc208fc93a702c67130047a1e4f 379.3 M /hbase/my_table/1742b0e038ece763184829e25067f138 7.3 G /hbase/my_table/194eae40d50554ce39c82dd8b2785d96 627.1 M /hbase/my_table/28aa1df8140f4eb289db76a17c583028 274.6 M /hbase/my_table/2f55b9760dbcaefca0e1064ce5da6f48 1.5 G /hbase/my_table/392f6070132ec9505d7aaecdc1202418 1.5 G /hbase/my_table/4396a8d8c5663de237574b967bf49b8a 1.6 G /hbase/my_table/440964e857d9beee1c24104bd96b7d5c 1.5 G /hbase/my_table/533369f47a365ab06f863d02c88f89e2 2.5 G /hbase/my_table/6d86b7199c128ae891b84fd9b1ccfd6e 1.2 G /hbase/my_table/6e5e6878028841c4d1f4c3b64d04698b 1.6 G /hbase/my_table/7dc1c717de025f3c15aa087cda5f76d2 200.2 M /hbase/my_table/8157d48f833bb3b708726c703874569d 118.0 M /hbase/my_table/85fb1d24bf9d03d748f615d3907589f2 2.0 G /hbase/my_table/94dd01c81c73dc35c02b6bd2c17d8d22 265.1 M /hbase/my_table/990d5adb14b2d1c936bd4a9c726f8e03 335.0 M /hbase/my_table/a9b673c142346014e01d7cf579b0e58a 502.1 M /hbase/my_table/ae3b1f6f537826f1bdb31bfc89d8ff9a 763.3 M /hbase/my_table/b6039c539b6cca2826022f863ed76c7b 470.7 M /hbase/my_table/be091ead2a408df55999950dcff6e7bc 5.9 G /hbase/my_table/c176cf8c19cc0fffab2af63ee7d1ca45 512.0 M /hbase/my_table/cb622a8a55ba575549759514281d5841 1.9 G /hbase/my_table/d201d1630ffdf08e4114dfc691488372 787.9 M /hbase/my_table/d78b4f682bb8e666488b06d0fd00ef9b 862.8 M /hbase/my_table/edd72e02de2a90aab086acd296d7da2b 627.5 M /hbase/my_table/f13a251ff7154f522e47bd54f0d1f921 1.3 G /hbase/my_table/fde68ec48d68e7f61a0258b7f8898be4

As you can see, there is a lot of regions and any of them has a size close to 10G...

If someone has been faced to this kind of issue or know if there is an other configuration to set up, please help me!

Thx

解决方案

@mpiffaretti, what you are seeing is very valid. I also got a little shock when I saw the regions sizes after an automatic split for the first time.

In HBase 0.94+, the default split policy is IncreasingToUpperBoundRegionSplitPolicy. The region size is decided by following the algorithm described below.

Split size is the number of regions that are on this server that all are of the same table, cubed, times 2x the region flush size OR the maximum region split size, whichever is smaller. For example, if the flush size is 128M, then after two flushes (256MB) we will split which will make two regions that will split when their size is 2^3 * 128M*2 = 2048M. If one of these regions splits, then there are three regions and now the split size is 3^3 * 128M*2 = 6912M, and so on until we reach the configured maximum filesize and then from there on out, we'll use that.

This is quite a nice strategy since you start to get a nice spread of regions over the region servers without having to wait until they reach the 10GB limit.

Alternatively, you would be better off pre-splitting your tables, since you want to make sure that you are getting the most out of the processing power of your cluster - if you have a single Region, all requests will go to the Region Server to which the region is assigned. Pre-splitting outs the control into your hands of how the regions are split over the row-key space.

这篇关于HBase区域使用hbase.hregion.max.filesize自动分割的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆