如何像Hive表分区一样在Hbase表中创建分区 [英] How To Create Partitions In Hbase Table As Like Hive Table Partitions

查看:475
本文介绍了如何像Hive表分区一样在Hbase表中创建分区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们计划从CDH3迁移到CDH4,作为此迁移的一部分,我们还计划将HBASE引入系统,因为它也更新了数据,在CDH3中,我们将Hive用作仓库.

We are planning to migrate from CDH3 to CDH4, as part of this migration we also planning to bring HBASE into out system because it also updates to the data, in CDH3 we are using Hive as warehouse.

在这里,我们遇到了迁移中的主要问题,Hive支持对表的分区.而且我们的系统中有许多具有不同模式的表,并且某些表具有基于日期的分区,我们拥有过去5年的数据历史记录(某些表中存在365 * 5分区).

Here we are having the major problem in migration, Hive supports partitions to tables. And our system has many tables in different schemas and some tables has partitions base on date, we have the history of data from last 5 years (365 * 5 partitions exists in some tables).

我们也想在HBase中实现相同的行为,当我浏览时,找不到在HBase中创建分区的解决方案.谁能帮助我在HBase中实现这种分区明智的表创建.

We want to achieve the same behavior in HBase also, when I browsed I couldnt find the solution for creating partitions in HBase. Can any one help me in implementing this partition wised table creation in HBase.

我们之所以选择HBASE,是因为它支持更新.

The reason we are going for HBASE is, it supports updates.

如果HBASE不支持此功能,则其他(例​​如MangoDB,Cassandra)支持我们的行为.

If HBASE is not supporting this which is other (like MangoDB, Cassandra) supports our behavior.

如果我们还可以找到至少一些解决方案的方法,那将是非常有用的帮助.

Its really great help if we can find at least some work around solutions also.

推荐答案

HBase具有与分区接近的概念,称为区域.但是,HBase中的这些分区不能像Hive(或RDBMS)分区那样工作.每个区域都有一个键范围,但是您可以通过拆分或划分键范围将键范围划分为较小的区域-例如如果您的原始区域拥有键0-9,则可以将其划分为两个较小的区域0-4和5-9或十个分区0,1,2 ...等.

HBase has a notion close to partition which is called a region. however These partitions in HBase don't work like Hive (or RDBMS) partitions. Each region holds a range of keys but you can break a key range into smaller regions by splitting or dividing it - e.g. if your original region holds keys 0-9 you can divide it to two smaller regions 0-4 and 5-9 or ten partitions 0,1,2... etc.

如果您的密钥是复合密钥,那么日期将是它的第一部分,然后是今天的密钥,则可以预分割hbase,以便每天获得一个或多个区域.

If your key would be composite so that the date would be the first part of it followed by whatever your key is today you can pre-split hbase so that each day would get one or more regions.

但是,您应该注意,最高有效字节是连续的键会减慢您的写入速度(如果您是一次性加载,可能不会造成问题),这是一个称为热点"的问题-您可以在

You should note, however, that a key where the most significant bytes are sequential will slow down your writes (may not be a problem if you're doing one-time loads) a problem called "hot spot" - you can read about it and a sample approach overcoming it in a blog post by Alex Baranau from sematext

这篇关于如何像Hive表分区一样在Hbase表中创建分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆