我应该在hbase中使用prefixfilter还是rowkey范围扫描 [英] Should I user prefixfilter or rowkey range scan in hbase

查看：2813 发布时间：2017/11/8 19:50:27 performance filter hbase database-scan

本文介绍了我应该在hbase中使用prefixfilter还是rowkey范围扫描的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我不知道为什么使用prefixfilter来查询的速度很慢，可以解释哪一个是最好的查询方式。  HBase的（主）：002：0>扫描'userlib'，{FILTER => org.apache.hadoop.hbase.filter.PrefixFilter.new（org.apache.hadoop.hbase.util.Bytes.toBytes（'0000115831F8'））} 
 ROW COLUMN + CELL 
 0000115831F8001 column = track：aid，timestamp = 1339121507633，value = aaa 
 1 row（s）in 41.0700 seconds 
 
 hbase（main）：002：0> scan'userlib'，{STARTROW =>'0000115831F8'，ENDROW =>'0000115831F9'} 
 ROW COLUMN + CELL 
 0000115831F8001 column = track：aid，timestamp = 1339121507633，value = aaa 
 1行（0.1100秒）

解决方案

HBase过滤器 - 甚至是行过滤器 - 真的很慢，因为在大多数情况下，这些过滤器会执行完整的表扫描，然后过滤这些结果。看看这个讨论： http：/ /grokbase.com/p/hbase/user/115cg0d7jh/very-slow-scan-performance-using-filters

然而，行键范围扫描确实快得多 - 他们做了相当于过滤的表扫描。这是因为行键以排序顺序存储（这是HBase的基本保证之一，这是一个类似于BigTable的解决方案），所以行键范围扫描非常快。更多的解释在这里： http://www.quora.com/How-feasible-is-real-time-querying-on-HBase-Can-it-通过编程语言，如Python，PHP，或者JSP来实现的。
$ b [UPDATE 1]事实证明，PrefixFilter做一个全表扫描，直到它传递过滤器中使用的前缀（如果它发现它）。使用PrefixFilter的快速性能建议似乎是为PrefixFilter另外指定一个start_row参数。请参阅有关2013年hbase用户邮件列表的讨论。

[UPDATE 2，from @ aaa90210]关于上述更新，现在有一个比PrefixFilter快得多的高效行前缀过滤器，请参阅此答案： https://stackoverflow.com/a/38632100/150050

I don't know why it's very slow if i use prefixfilter to query.Can someone explain which is the best way to query,thanks.
hbase(main):002:0> scan 'userlib',{FILTER=>org.apache.hadoop.hbase.filter.PrefixFilter.new(org.apache.hadoop.hbase.util.Bytes.toBytes('0000115831F8'))} ROW COLUMN+CELL 0000115831F8001 column=track:aid, timestamp=1339121507633, value=aaa 1 row(s) in 41.0700 seconds hbase(main):002:0> scan 'userlib',{STARTROW=>'0000115831F8',ENDROW=>'0000115831F9'} ROW COLUMN+CELL 0000115831F8001 column=track:aid, timestamp=1339121507633, value=aaa 1 row(s) in 0.1100 seconds

解决方案
HBase filters - even row filters - are really slow, since in most cases these do a complete table scan, and then filter on those results. Have a look at this discussion: http://grokbase.com/p/hbase/user/115cg0d7jh/very-slow-scan-performance-using-filters

Row key range scans however, are indeed much faster - they do the equivalent of a filtered table scan. This is because the row keys are stored in sorted order (this is one of the basic guarantees of HBase, which is a BigTable-like solution), so the range scans on row keys are very fast. More explanation here: http://www.quora.com/How-feasible-is-real-time-querying-on-HBase-Can-it-be-achieved-through-a-programming-language-such-as-Python-PHP-or-JSP

[UPDATE 1] turns out that PrefixFilter does do a full table scan until it passes the prefix used in the filter (if it finds it). The recommendation for fast performance using a PrefixFilter seems to be to specify a start_row parameter in addition to the PrefixFilter. See related 2013 discussion on the hbase-user mailing list.

[UPDATE 2, from @aaa90210] In regards to above update, there is now an efficient row prefix filter that is much faster than PrefixFilter, see this answer: https://stackoverflow.com/a/38632100/150050

这篇关于我应该在hbase中使用prefixfilter还是rowkey范围扫描的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

我应该在hbase中使用prefixfilter还是rowkey范围扫描 [英] Should I user prefixfilter or rowkey range scan in hbase

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

我应该在hbase中使用prefixfilter还是rowkey范围扫描 [英] Should I user prefixfilter or rowkey range scan in hbase

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭