HBase扫描操作缓存 [英] HBase scan operation caching

查看:355
本文介绍了HBase扫描操作缓存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

HBase扫描机制的setCaching和setBatch有什么不同?
在扫描大量数据的过程中,为了获得最佳性能,我必须使用哪些内容?

解决方案

许多列(或者非常大的列)你应该完全忘记setBatch()并专注于setCaching():




setCaching(int caching)



设置将传递给扫描器的缓存行数。如果未设置,则将应用配置设置HConstants.HBASE_CLIENT_SCANNER_CACHING。

setBatch(int batch)


$ b更高的缓存值可以使扫描器更快,但会占用更多的内存。 $ b

设置每次调用next()的最大数量返回值()




setBatch是关于在每个调用/迭代中应该返回的行的值的数量。这里有一篇不错的文章: http:// blog。 $ j

What is the difference between setCaching and setBatch at HBase scan mechanism? What I must use for best performance during scan large data volumes?

Unless you have super-wide tables with many columns (or very large ones) you should completely forgot about setBatch() and focus exclusively on setCaching():


setCaching(int caching)

Set the number of rows for caching that will be passed to scanners. If not set, the Configuration setting HConstants.HBASE_CLIENT_SCANNER_CACHING will apply. Higher caching values will enable faster scanners but will use more memory.

setBatch(int batch)

Set the maximum number of values to return for each call to next()


setBatch is about the number of values of the row that should be returned on each call/iteration. Here's a nice post about it: http://blog.jdwyah.com/2013/08/hbase-scan-batch-vs-cache.html

这篇关于HBase扫描操作缓存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆