用于单调递增键的HBase行键设计 [英] HBase row key design for monotonically increasing keys

查看:137
本文介绍了用于单调递增键的HBase行键设计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 < prefix>〜1 $我有一个HBase表,我在这里写行键: b $ b< prefix>〜2 
< prefix>〜3
...
<前缀>〜9
< prefix>〜10

HBase shell的扫描结果为:

 < prefix>〜1 
< prefix>〜10
< prefix>〜2
< prefix>〜3
。 ..
< prefix>〜9

行键的设计应该如此密钥< prefix>〜10 的行最后一次?我正在寻找一些推荐的方法或者更为流行的方式来设计HBase行键。

应该设计一个行密钥,以便密钥~10的行最后一次?



以这种方式查看扫描输出,因为HBase中的rowkeys保留不管广告插入顺序如何,按照字典顺序排列 。这意味着它们是基于它们的字符串表示进行排序的。请记住,HBase中的rowkeys被视为具有字符串表示的字节数组。最低顺序rowkey首先出现在表中。这就是为什么10出现在2之前等等。请参阅此页面上的部分以了解更多信息。



当您用零填充整数时,它们的自然顺序保持不变,同时按照字典顺序进行排序,这就是为什么您会看到扫描顺序与您插入数据的顺序相同的原因。要做到这一点,您可以按照@shutty的建议来设计行键。



我正在寻找一些推荐的方法或更为流行的方法来设计HBase行键。



为了设计一个好的设计,需要遵循一些通用的准则:


  • 保持rowkey尽可能小。

  • 避免使用单调递增的rowkeys,例如时间戳等。这是一种糟糕的shecma设计,导致RegionServer hotspotting。如果您无法避免使用某些方法,例如散列或腌制以避免热点。 避免将字符串用作行键(如果可能)。与其整数或长表示相比,数字的字符串表示需要更多的字节。 例如:长是8个字节。您可以在这八个字节中存储最多18,446,744,073,709,551,615的未签名数字。如果将此数字作为字符串存储 - 假定每个字符有一个字节 - 您需要接近3倍的字节数。

  • 使用一些机制,如散列,以便实现行的均匀分布如果你的地区没有被均匀加载。您也可以创建预先拆分的表来实现此目的。 link 了解更多有关rowkey设计的信息。



    HTH


    I've an HBase table where I'm writing the row keys like:

    <prefix>~1
    <prefix>~2
    <prefix>~3
    ...
    <prefix>~9
    <prefix>~10
    

    The scan on the HBase shell gives an output:

    <prefix>~1
    <prefix>~10
    <prefix>~2
    <prefix>~3
    ...
    <prefix>~9
    

    How should a row key be designed so that the row with key <prefix>~10 comes last? I'm looking for some recommended ways or the ways that are more popular for designing HBase row keys.

    解决方案

    How should a row key be designed so that the row with key ~10 comes last?

    You see the scan output in this way because rowkeys in HBase are kept sorted lexicographically irrespective of the insertion order. This means that they are sorted based on their string representations. Remember that rowkeys in HBase are treated as an array of bytes having a string representation. The lowest order rowkey appears first in a table. That's why 10 appears before 2 and so on. See the sections Rows on this page to know more about this.

    When you left pad the integers with zeros their natural ordering is kept intact while sorting lexicographically and that's why you see the scan order same as the order in which you had inserted the data. To do that you can design your rowkeys as suggested by @shutty.

    I'm looking for some recommended ways or the ways that are more popular for designing HBase row keys.

    There are some general guidelines to be followed in order to devise a good design :

    • Keep the rowkey as small as possible.
    • Avoid using monotonically increasing rowkeys, such as timestamp etc. This is a poor shecma design and leads to RegionServer hotspotting. If you can't avoid that use someway, like hashing or salting to avoid hotspotting.
    • Avoid using Strings as rowkeys if possible. String representation of a number takes more bytes as compared to its integer or long representation. For example : A long is 8 bytes. You can store an unsigned number up to 18,446,744,073,709,551,615 in those eight bytes. If you stored this number as a String -- presuming a byte per character -- you need nearly 3x the bytes.
    • Use some mechanism, like hashing, in order to get uniform distribution of rows in case your regions are not evenly loaded. You could also create pre-splitted tables to achieve this.

    See this link for more on rowkey design.

    HTH

    这篇关于用于单调递增键的HBase行键设计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆