HBase如何启用对HDFS的随机访问? [英] How does HBase enable Random Access to HDFS?

查看:168
本文介绍了HBase如何启用对HDFS的随机访问?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑到HBase是一个数据库,其文件存储在HDFS中,它如何能够随机访问HDFS中的单个数据?通过哪种方法完成?



HBase参考指南


HBase在内部将您的数据置于HDFS上索引的StoreFiles中,速度查找。有关HBase如何实现其目标的更多信息,请参阅第5章,数据模型和本章的其余部分。


扫描这两个章节并未揭示此问题的高级答案。


$ b $ HBase如何使随机访问存储在HDFS中的文件成为可能?

(排序)的关键。给定一个随机密钥,客户端可以确定区域服务器何时从中请求该行。区域服务器可以确定从哪个区域检索行,然后通过该区域执行二进制搜索以访问正确的行。这是通过有足够的统计数据来了解块的数量,块大小,启动键和结束键来实现的。例如:一个表可能包含10 TB的数据。但是,表格被分成4GB大小的区域。每个区域都有一个开始/结束键。客户端可以获取表格的区域列表,并确定哪个区域具有要查找的关键字。区域被分成块,以便区域服务器可以通过其块执行二进制搜索。块本质上是关键,属性,值,版本的长列表。如果您知道每个块的起始键,您可以确定要访问的一个文件,以及字节偏移量(块)要开始读取的内容,以查看二进制搜索中的位置。


Given that HBase is a database with its files stored in HDFS, how does it enable random access to a singular piece of data within HDFS? By which method is this accomplished?

From the Apache HBase Reference Guide:

HBase internally puts your data in indexed "StoreFiles" that exist on HDFS for high-speed lookups. See the Chapter 5, Data Model and the rest of this chapter for more information on how HBase achieves its goals.

Scanning both chapters didn't reveal a high-level answer for this question.

So how does HBase enable random access to files stored in HDFS?

解决方案

HBase stores data in HFiles that are indexed (sorted) by their key. Given a random key, the client can determine when region server to ask for the row from. The region server can determine which region to retrieve the row from, and then do a binary search through the region to access the correct row. This is accomplished by having sufficient statistics to know the number of blocks, block size, start key, and end key.

For example: a table may contain 10 TB of data. But, the table is broken up into regions of size 4GB. Each region has a start/end key. The client can get the list of regions for a table and determine which region has the key it is looking for. Regions are broken up into blocks, so that the region server can do a binary search through its blocks. Blocks are essentially long lists of key, attribute, value, version. If you know what the starting key is for each block, you can determine one file to access, and what the byte-offset (block) is to start reading to see where you are in the binary search.

这篇关于HBase如何启用对HDFS的随机访问?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆