有没有办法在HBase中匹配rowkey-search的COUNT行 [英] Is there a way in HBase to COUNT rows matching rowkey-search

查看:123
本文介绍了有没有办法在HBase中匹配rowkey-search的COUNT行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我的Rowkey有两个部分(NUM1〜NUM2)。



我想在Rowkey的第一部分做一个计数组。有没有办法在HBase中做到这一点?



我可以随时将其作为M / R作业读取所有行,组,数......但我是想知道是否有办法在HBase中做到这一点?

解决方案

选项1:



您可以使用前缀过滤器。 ...有些像下面的东西。



prefixfilter:




此过滤器将一个参数作为行键的前缀。它只返回
那些以指定行开始的行中的键值
前缀

语法



PrefixFilter(< row_prefix>)


同样可以与java客户端一起使用



使用Hbase shell的示例:

 扫描'yourtable',{FILTER => PrefixFilter('12345 | abc | 50 | 2016-05-05')} 

扫描'yourtable',{STARTROW =&'12345'FILTER => PrefixFilter('2016-05-05 08:10:10')}

根据您的要求...



注意:如果您想从java中执行,java hbase scan api也具有相同的方法



< h3>选项2:


FuzzyRowFilter(请参阅hbase-the-definitive)这对我们的情况非常有用
我们已经使用了像bulk-clients这样的批量客户端,独立的hbase客户端


此过滤器对行键作用,但以模糊方式。它需要一个应该返回的行键列表,加上一个伴随的byte []数组,表示行键中每个字节的重要性。构造函数如下:

  FuzzyRowFilter(List< Pair< byte [],byte []>> fuzzyKeysData)

fuzzyKeysData通过取两个值中的一个来指定所提到的行密钥字节的重要性:


0表示行键中同一位置的字节必须按原样匹配
。 1表示相应的行键字节不是
,并且始终被接受。

*示例:部分行密钥匹配*
一个可能的示例是匹配部分密钥,但不是从左到右,而是在复合密钥内的某处。假设行密钥格式为 _,其中固定长度的部分为4,其值为2,为4,长度为2个字节。该应用程序现在要求所有在任何一年的1月份执行特定操作(编码为99)的用户。然后,行键和模糊数据对将如下:

行键
????? 99 ?? ?? 01,其中?是一个任意的字符,因为它被忽略了。
模糊数据
=\x01\x01\x01\x01\x00\x00\x00\x00\x01\x01\x01\x01\ x00 \x00\x00
换句话说,模糊数据数组指示过滤器查找与?????????????????????????????????????????????????????? ?将接受任何字符。



这个过滤器的一个优点是它可能在匹配结束时计算下一个匹配的行键。它实现了getNextCellHint()方法来帮助服务器快速转发到可能匹配的下一个行范围。这加快了扫描速度,特别是当跳过的范围非常大时。示例4-12使用过滤器从测试数据集中获取特定行。



按列前缀过滤示例

  List< Pair< byte [],byte []>> keys = new ArrayList< Pair< byte [],byte []>>(); 
keys.add(new Pair< byte [],byte []>(
Bytes.toBytes(row-?5)),new byte [] {0,0,0,0, 1,0}));
Filter filter = new FuzzyRowFilter(keys);

Scan scan = new Scan()
.addColumn(Bytes.toBytes(colfam1),Bytes.toBytes(col-5))
.setFilter(filter );
ResultScanner scanner = table.getScanner(scan);
for(结果结果:scanner){
System.out.println(result);
}
scanner.close();

示例代码还为扫描添加了一个过滤列,以保持输出的简洁:



向表中添加行...
扫描结果:

  keyvalues = {row-05 / colfam1:col-01/1 / Put / vlen = 9 / seqid = 0,
row-05 / colfam1:col-02/2 / Put / vlen = 9 / seqid = 0,
...
row-05 / colfam1:col-09/9 / Put / vlen = 9 / seqid = 0,
row-05 / colfam1:col-10 / 10 / Put / vlen = 9 / seqid = 0}
keyvalues = {row-15 / colfam1:col-01/1 / Put / vlen = 9 / seqid = 0,
row-15 / colfam1 :col-02/2 / Put / vlen = 9 / seqid = 0,
...
row-15 / colfam1:col-09/9 / Put / vlen = 9 / seqid = 0,
row-15 / colfam1:col-10/10 / Put / vlen = 9 / seqid = 0}

测试代码接线将20行添加到表中,名为row-01到row-20。我们要检索所有匹配模式行的行 - ?5,换句话说,所有以数字5结尾的行。上面的输出确认了正确的结果。


Let's say my Rowkey has two parts (NUM1~NUM2).

I would like to do a count group by the first part of the Rowkey. Is there a way to do this in HBase?

I can always do it as a M/R job read all the rows, group, count...but I was wondering if there is a way to do it in HBase?

解决方案

Option 1 :

you can use prefix filter.... some thing like below.

prefixfilter:

This filter takes one argument a prefix of a row key. It returns only those key-values present in a row that starts with the specified row prefix

Syntax

PrefixFilter (<row_prefix>)

Same can be used with java client as well

Examples using Hbase shell :

scan 'yourtable', {FILTER => "PrefixFilter('12345|abc|50|2016-05-05')"}

scan 'yourtable', {STARTROW=>'12345' FILTER => "PrefixFilter('2016-05-05 08:10:10')"}

based on your requirement...

NOTE : java hbase scan api also has same methods if you want to do it from java

Option2 :

FuzzyRowFilter(see hbase-the-definitive) This is really useful in our case We have used bulk clients like map-reduce as well as standalone hbase clients

This filter acts on row keys, but in a fuzzy manner. It needs a list of row keys that should be returned, plus an accompanying byte[] array that signifies the importance of each byte in the row key. The constructor is as such:

FuzzyRowFilter(List<Pair<byte[], byte[]>> fuzzyKeysData)

The fuzzyKeysData specifies the mentioned significance of a row key byte, by taking one of two values:

0 Indicates that the byte at the same position in the row key must match as-is. 1 Means that the corresponding row key byte does not matter and is always accepted.

* Example: Partial Row Key Matching * A possible example is matching partial keys, but not from left to right, rather somewhere inside a compound key. Assuming a row key format of _, with fixed length parts, where is 4, is 2, is 4, and is 2 bytes long. The application now requests all users that performed certain action (encoded as 99) in January of any year. Then the pair for row key and fuzzy data would be the following:

row key "????99????_01", where the "?" is an arbitrary character, since it is ignored. fuzzy data = "\x01\x01\x01\x01\x00\x00\x00\x00\x01\x01\x01\x01\x00\x00\x00" In other words, the fuzzy data array instructs the filter to find all row keys matching "????99????_01", where the "?" will accept any character.

An advantage of this filter is that it can likely compute the next matching row key when it comes to an end of a matching one. It implements the getNextCellHint() method to help the servers in fast-forwarding to the next range of rows that might match. This speeds up scanning, especially when the skipped ranges are quite large. Example 4-12 uses the filter to grab specific rows from a test data set.

Example filtering by column prefix

List<Pair<byte[], byte[]>> keys = new ArrayList<Pair<byte[], byte[]>>();
keys.add(new Pair<byte[], byte[]>(
  Bytes.toBytes("row-?5"), new byte[] { 0, 0, 0, 0, 1, 0 }));
Filter filter = new FuzzyRowFilter(keys);

Scan scan = new Scan()
  .addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("col-5"))
  .setFilter(filter);
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
  System.out.println(result);
}
scanner.close();

The example code also adds a filtering column to the scan, just to keep the output short:

Adding rows to table... Results of scan:

keyvalues={row-05/colfam1:col-01/1/Put/vlen=9/seqid=0,
           row-05/colfam1:col-02/2/Put/vlen=9/seqid=0,
           ...
           row-05/colfam1:col-09/9/Put/vlen=9/seqid=0,
           row-05/colfam1:col-10/10/Put/vlen=9/seqid=0}
keyvalues={row-15/colfam1:col-01/1/Put/vlen=9/seqid=0,
           row-15/colfam1:col-02/2/Put/vlen=9/seqid=0,
           ...
           row-15/colfam1:col-09/9/Put/vlen=9/seqid=0,
           row-15/colfam1:col-10/10/Put/vlen=9/seqid=0}

The test code wiring adds 20 rows to the table, named row-01 to row-20. We want to retrieve all the rows that match the pattern row-?5, in other words all rows that end in the number 5. The output above confirms the correct result.

这篇关于有没有办法在HBase中匹配rowkey-search的COUNT行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆