我应该在 HBase 中使用 prefixfilter 还是 rowkey 范围扫描 [英] Should I use prefixfilter or rowkey range scan in HBase

查看:41
本文介绍了我应该在 HBase 中使用 prefixfilter 还是 rowkey 范围扫描的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

不知道为什么用prefixfilter查询会很慢.有人能解释一下哪种是查询 HBase 的最佳方式吗,谢谢.

I don't know why it's very slow if I use prefixfilter to query. Can someone explain which is the best way to query HBase, thanks.

hbase(main):002:0> scan 'userlib',{FILTER=>org.apache.hadoop.hbase.filter.PrefixFilter.new(org.apache.hadoop.hbase.util.Bytes.toBytes('0000115831F8'))}
ROW               COLUMN+CELL                                                                                                                                
0000115831F8001   column=track:aid, timestamp=1339121507633, value=aaa                                                                                       
1 row(s) in 41.0700 seconds

hbase(main):002:0> scan 'userlib',{STARTROW=>'0000115831F8',ENDROW=>'0000115831F9'}                                                                                        
ROW               COLUMN+CELL                                                                                                                                
0000115831F8001   column=track:aid, timestamp=1339121507633, value=aaa                                                                                       
1 row(s) in 0.1100 seconds

推荐答案

HBase 过滤器 - 甚至行过滤器 - 真的很慢,因为在大多数情况下,它们会执行完整的表扫描,然后对这些结果进行过滤.看看这个讨论:http://grokbase.com/p/hbase/user/115cg0d7jh/very-slow-scan-performance-using-filters

HBase filters - even row filters - are really slow, since in most cases these do a complete table scan, and then filter on those results. Have a look at this discussion: http://grokbase.com/p/hbase/user/115cg0d7jh/very-slow-scan-performance-using-filters

然而,行键范围扫描确实要快得多——它们相当于过滤表扫描.这是因为行键是按排序的顺序存储的(这是HBase的基本保证之一,它是一种类似BigTable的解决方案),所以对行键的范围扫描非常快.更多解释在这里:http://www.quora.com/How-feasible-is-real-time-querying-on-HBase-Can-it-be-通过编程语言实现,如Python-PHP-or-JSP

Row key range scans however, are indeed much faster - they do the equivalent of a filtered table scan. This is because the row keys are stored in sorted order (this is one of the basic guarantees of HBase, which is a BigTable-like solution), so the range scans on row keys are very fast. More explanation here: http://www.quora.com/How-feasible-is-real-time-querying-on-HBase-Can-it-be-achieved-through-a-programming-language-such-as-Python-PHP-or-JSP

[UPDATE 1] 结果表明 PrefixFilter 会执行全表扫描,直到它通过过滤器中使用的前缀(如果找到).使用 PrefixFilter 获得快速性能的建议似乎是在 PrefixFilter 之外指定一个 start_row 参数.请参阅 2013 年有关 hbase-user 邮件列表的相关讨论.

[UPDATE 1] turns out that PrefixFilter does do a full table scan until it passes the prefix used in the filter (if it finds it). The recommendation for fast performance using a PrefixFilter seems to be to specify a start_row parameter in addition to the PrefixFilter. See related 2013 discussion on the hbase-user mailing list.

[UPDATE 2, from @aaa90210] 关于上述更新,现在有一个比 PrefixFilter 快得多的高效行前缀过滤器,请参阅此答案:https://stackoverflow.com/a/38632100/150050

[UPDATE 2, from @aaa90210] In regards to above update, there is now an efficient row prefix filter that is much faster than PrefixFilter, see this answer: https://stackoverflow.com/a/38632100/150050

这篇关于我应该在 HBase 中使用 prefixfilter 还是 rowkey 范围扫描的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆