hbase如何选择预分割策略以及它如何影响你的rowkeys [英] hbase how to choose pre split strategies and how its affect your rowkeys

查看:125
本文介绍了hbase如何选择预分割策略以及它如何影响你的rowkeys的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图预分割hbase表。一个HbaseAdmin的java api是创建一个hbase表的函数,具有startkey,endkey和区域数量。这里是我从HbaseAdmin使用的java api void createTable(HTableDescriptor desc,byte [] startKey,byte [] endKey,int numRegions)



是否有任何关于基于数据集选择startkey和endkey的建议?

我的方法是让我们说有100条记录在数据集中。我希望将数据分成大约10个地区,每个地区大约有10条记录。所以要找到startkey我会说 scan'/ mytable',{LIMIT => 10} ,然后选择最后一行作为我的startkey,然后选择 scan'/ mytable',{LIMIT => 90} 并选择最后一个rowkey作为我的endkey。



这种方法能够找到startkey和rowkey看起来不错,还是有更好的做法?



编辑
我尝试了以下方法来预分割空表。 ALl三人没有按照我使用它的方式工作。我想我将需要盐的关键得到平等分配。



PS>我只显示一些区域信息



1)


  byte [] [] splits = new RegionSplitter.HexStringSplit()。split(10); 
hBaseAdmin.createTable(tabledescriptor,splits);

这给区域带来了如下界限:



<$
startkey: - INFINITY,
endkey:11111111,
numberofrows:3628951,
},
{
startkey:11111111,
endkey:22222222,
},
{
startkey: 22222222,
endkey:33333333,
},
{
startkey:33333333,
endkey:44444444 ,
},
{
startkey:88888888,
endkey:99999999,
},
{
startkey:99999999,
endkey:aaaaaaaa,
},
{
startkey:aaaaaaaa,
endkey :bbbbbbbb,
},
{
startkey:eeeeeeee,
endkey:INFINITY,
}

这是没用的,因为我的行键是复合形式,如'deptId | month | roleId | regionId '并且不符合上述限制。



2)

  byte [] [] splits = new 。RegionSplitter.UniformSplit()分割(10); 
hBaseAdmin.createTable(tabledescriptor,split)

这有同样的问题:

  {
startkey: - INFINITY,
endkey:\\x19\ \x99\\x99\\x99\\\\\\\\\\\\\\\\\\\\' startkey:\\x19\\\x99\\\x99\\x99\\x99\\x99\\x99\\\
endkey :33333332,
}
{
startkey:33333332,
endkey:L\\xCC\\xCC\\\ \\ xcc \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\'
endkey:INFINITY,
}



<3>

  hBaseAdmin.createTable(tabledescriptor,Bytes.toBytes(04120 | 200808 | 805 | 1999),
Bytes.toBytes(01253 | 201501 | 805 | 1999),10);
{
startkey: - INFINITY,
endkey:04120 | 200808 | 805 | 1999,
}
{
startkey:04120 | 200808 | 805 | 1999,
endkey:000PTP \\\xDC200W\\\ \\ xD07 \\\x9C805 | 1999,
}
{
startkey:000PTP \\\xDC200W\\\xD07\\x9C805 | 1999,
endkey:000ptq <200wp6 \\xBC805 | 1999,
}
{
startkey:001\\x11\\ x15 \\x13\\\\\\\\\\\\\\\\\\\\\\\\'×××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××
startkey:01253 | 201501 | 805 | 1999,
endkey:INFINITY,
}


但是基本的东西是,


随着



编辑: 如果你想实现自己的regionSplit
,你可以实现并提供你自己的实现 org.apache.hadoop.hbase.util.RegionSplitter.SplitAlgorithm 并覆盖


  public byte第二个问题:
我的理解是:$ b(





$ b $ $ b您想要查找特定表格中插入数据的startrowkey和end rowkey ...下面是th如果您想查找开始和结束行键 scan'.meta'
code>表来理解你的rowkey和end rowkey是如何开始的。


  • 你可以访问ui http:// hbasemaster:60010 如果你能看到rowkeys是如何分布在每个区域的。对于每个区域开始和rowkeys将在那里。

  • 要知道如何在预分割表并将其插入hbase之后组织键的组织方式,请使用 FirstKeyOnlyFilter




    例如: scan'yourtablename',FILTER => 'FirstKeyOnlyFilter()'
    显示您所有的100行密钥。



    如果您有大量数据),并想要转储所有rowkeys,那么你可以使用下面的外侧shell。

      echoscan'yourtablename ',FILTER =>'FirstKeyOnlyFilter()'| hbase shell> rowkeys.txt 


    I am trying to pre split hbase table. One the HbaseAdmin java api is to create an hbase table is function of startkey, endkey and number of regions. Here's the java api that I use from HbaseAdmin void createTable(HTableDescriptor desc, byte[] startKey, byte[] endKey, int numRegions)

    Is there any recommendation on choosing startkey and endkey based on dataset?

    My approach is lets say we have 100 records in dataset. I want data divided approximately in 10 regions so each will have approx 10 records. so to find startkey i will say scan '/mytable', {LIMIT => 10} and pick the last rowkey as my startkey and then scan '/mytable', {LIMIT => 90} and pick the last rowkey as my endkey.

    Does this approach to find startkey and rowkey looks ok or is there better practice?

    EDIT I tried following approaches to pre split empty table. ALl three didn't work the way I used it. I think I will need to salt the key to get equal distribution.

    PS> I am displaying only some region info

    1)

    byte[][] splits = new RegionSplitter.HexStringSplit().split(10);
    hBaseAdmin.createTable(tabledescriptor, splits);
    

    This gives regions with boundaries like:

    {
        "startkey":"-INFINITY",
        "endkey":"11111111",
        "numberofrows":3628951,
    },
    {
        "startkey":"11111111",
        "endkey":"22222222",
    },
    {   
        "startkey":"22222222",
        "endkey":"33333333",
    },
    {
        "startkey":"33333333",
        "endkey":"44444444",
    },
    {
        "startkey":"88888888",
        "endkey":"99999999",
    },
    {
        "startkey":"99999999",
        "endkey":"aaaaaaaa",
    },
    {
        "startkey":"aaaaaaaa",
        "endkey":"bbbbbbbb",
    },
    {
        "startkey":"eeeeeeee",
        "endkey":"INFINITY",
    }
    

    This is useless as my rowkeys are of composite form like 'deptId|month|roleId|regionId' and doesn't fit into above boundaries.

    2)

    byte[][] splits = new RegionSplitter.UniformSplit().split(10);
    hBaseAdmin.createTable(tabledescriptor, splits)
    

    This has same issue:

    {
        "startkey":"-INFINITY",
        "endkey":"\\x19\\x99\\x99\\x99\\x99\\x99\\x99\\x99",
    }
    {
        "startkey":"\\x19\\x99\\x99\\x99\\x99\\x99\\x99\\
        "endkey":"33333332",
    }
    {
        "startkey":"33333332",
        "endkey":"L\\xCC\\xCC\\xCC\\xCC\\xCC\\xCC\\xCB",
    }
    {
        "startkey":"\\xE6ffffffa",
        "endkey":"INFINITY",
    }
    

    3) I tried supplying start key and end key and got following useless regions.

    hBaseAdmin.createTable(tabledescriptor, Bytes.toBytes("04120|200808|805|1999"),
                                   Bytes.toBytes("01253|201501|805|1999"), 10);
    {
        "startkey":"-INFINITY",
        "endkey":"04120|200808|805|1999",
    }
    {
        "startkey":"04120|200808|805|1999",
        "endkey":"000PTP\\xDC200W\\xD07\\x9C805|1999",
    }
    {
        "startkey":"000PTP\\xDC200W\\xD07\\x9C805|1999",
        "endkey":"000ptq<200wp6\\xBC805|1999",
    }
    {
        "startkey":"001\\x11\\x15\\x13\\x1C201\\x15\\x902\\x5C805|1999",
        "endkey":"01253|201501|805|1999",
    }
    {
        "startkey":"01253|201501|805|1999",
        "endkey":"INFINITY",
    }
    

    解决方案

    First question : Out of my experience with hbase, I am not aware any hard rule for creating number of regions, with start key and end key.

    But underlying thing is,

    With your rowkey design, data should be distributed across the regions and not hotspotted (36.1. Hotspotting)

    However, if you define fixed number of regions as you mentioned 10. There may not be 10 after heavy data load. If it reaches, certain limit, number of regions will again split.

    In your way of creating table with hbase admin documentation says, Creates a new table with the specified number of regions. The start key specified will become the end key of the first region of the table, and the end key specified will become the start key of the last region of the table (the first region has a null start key and the last region has a null end key).

    Moreover, I prefer creating a table through script with presplits say 0-10 and I will design a rowkey such that its salted and it will be sitting on one of region servers to avoid hotspotting. like

    EDIT : If you want to implement own regionSplit you can implement and provide your own implementation org.apache.hadoop.hbase.util.RegionSplitter.SplitAlgorithm and override

    public byte[][] split(int numberOfSplits)
    

    Second question : My understanding : You want to find startrowkey and end rowkey for the inserted data in a specific table... below are the ways.

    • If you want to find start and end rowkeys scan '.meta' table to understand how is your start rowkey and end rowkey..

    • you can access ui http://hbasemaster:60010 if you can see how the rowkeys are spread across each region. for each region start and rowkeys will be there.

    • to know how your keys are organized, after pre splitting your table and inserting in to hbase... use FirstKeyOnlyFilter

    for example : scan 'yourtablename', FILTER => 'FirstKeyOnlyFilter()' which displays all your 100 rowkeys.

    if you have huge data (not 100 rows as you mentioned) and want to take a dump of all rowkeys then you can use below from out side shell..

    echo "scan 'yourtablename', FILTER => 'FirstKeyOnlyFilter()'" | hbase shell > rowkeys.txt
    

    这篇关于hbase如何选择预分割策略以及它如何影响你的rowkeys的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆