HBase集团 [英] Group by In HBase

查看:88
本文介绍了HBase集团的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我几乎不知道HBase的一切。对于基本的问题抱歉。



想象一下,我有一个包含10个int,一个datetime和一个字符串列的1000亿行的表。


  1. HBase是否允许查询此表并根据键(甚至组合键)对结果进行分组?

  2. 如果是这样,是否必须运行map / reduce作业?

  3. 你如何为查询提供它?

  4. HBase可以在一个表上执行实时查询吗?


解决方案

HBase中的数据聚合与实时分析需求相交叉。虽然HBase不是为这种类型的功能而构建的,但它仍然有很多需求。因此,这样做的方式的数量将会被开发出来。

1):在Hive中将HBase表注册为外部表并进行聚合。数据将通过HBase API访问,效率不高。 用Hbase配置Hive 这是关于如何完成的讨论。
这是通过HBase数据进行分组的最强大的方法。它意味着运行MR作业,而不是HHive,而不是HBase。


2)您可以编写自己的MR作业,使用位于HDFS中的HFile中的HBase数据。这将是最有效的方式,但不是简单的,你处理的数据会有些陈旧。这是非常有效的,因为数据不会通过HBase API传输 - 而是以顺序方式从HDFS访问。

3)HBase的下一个版本将包含可以在特定区域内进行聚合的协处理器。您可以假定它们是RDBMS字中的一种存储过程。

4)在内存中,将在未来的HBase版本中计划将在一个节点中进行parrleleize的区域间MR作业。它将使协处理器能够进行更高级的分析处理。


I almost do not know anything about HBase. Sorry for basic questions.

Imagine I have a table of 100 billion rows with 10 int, one datetime, and one string column.

  1. Does HBase allow querying this table and Group the result based on key (even a composite key)?
  2. If so, does it have to run a map/reduce job to it?
  3. How do you feed it the query?
  4. Can HBase in general perform real-time like queries on a table?

解决方案

Data aggregation in HBase intersects with the "real time analytics" need. While HBase is not built for this type of functionality there is a lot of need for it. So the number of ways to do so is / will be developed.
1) : Register HBase table as external table in Hive and do aggregations. Data will be accessed via HBase API what is not that efficient. Configuring Hive with Hbase this is discussion about how it can be done. It is most powerful way to group by HBase data. It do imply running MR jobs but by HHive, not by HBase.
2) You can write you own MR job working with HBase data sitting in HFiles in the HDFS. It will be most efficient way, but not simple and data you processed would be somewhat stale. It is most efficient since data will not be transferred via HBase API - instead it will be accesses right from HDFS in sequential manner.
3) Next version of HBase will contain coprocessors which would be able to aggregations inside specific regions. You can assume them to be a kind of stored procedures in the RDBMS word.
4) In memory, Inter-region MR job which will be parralelized in one node is also planned in the future HBase releases. It will enable somewhat more advanced analytical processing then coprocessors.

这篇关于HBase集团的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆