基于Time Stamp从Hive访问HBase表数据 [英] Accessing HBase table data from Hive based on Time Stamp

查看:162
本文介绍了基于Time Stamp从Hive访问HBase表数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经创建了一个HBase,提供默认版本为10

  create'tablename',{NAME => 'cf',VERSIONS => 10} 

并插入两行(第1行和第2行)

 放置'tablename','row1','cf:id','row1id'
放'tablename','row1','cf:name' ,'row1name'
放'tablename','row2','cf:id','row2id'
放'tablename','row2','cf:name','row2name'
把'tablename','row2','cf:name','row2nameupdate'
放'tablename','row2','cf:name','row2nameupdateagain'
放'tablename' ,'row2','cf:name','row2nameupdateonemoretime'

尝试使用扫描选择数据

 扫描'tablename',{RAW =>真,VERSIONS => 10} 

我可以看到所有版本的数据。



现在创建一个Hive External表来指向这个HBase表

$ CREATE EXTERNAL TABLE hive_timestampupdate(key int ,值字符串)
STORED BY'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES(hbase.columns.mapping=:key,cf:name)
TBLPROPERTIES(hbase.table.name=tablename);

当我查询表 hive_timestampupdate 时,I能够看到HBase表中的数据。

  select * from hive_timestampupdate; 

这里我想根据时间戳查询数据。有没有一种方法可以根据HBase表的时间戳来查询数据?

解决方案

不幸的是,没有。根据 Hive HBase集成文档,


目前没有办法访问HBase timestamp属性,并且查询总是使用最新的时间戳访问数据。


有些JIRA讨论与时间戳有关的功能,但他们并不真正按照你的要求去做,而且他们也没有得到很好的接待:($ / b $ b

I have created a HBase by mentioning the default versions as 10

create 'tablename',{NAME => 'cf', VERSIONS => 10}

and inserted two rows(row1 and row2)

put 'tablename','row1','cf:id','row1id'
put 'tablename','row1','cf:name','row1name'
put 'tablename','row2','cf:id','row2id'
put 'tablename','row2','cf:name','row2name'
put 'tablename','row2','cf:name','row2nameupdate'
put 'tablename','row2','cf:name','row2nameupdateagain'
put 'tablename','row2','cf:name','row2nameupdateonemoretime'

Tried to select the data using scan

scan 'tablename',{RAW => true, VERSIONS => 10}

I'm able to see all the versions data.

Now created a Hive External table to point to this HBase table

CREATE EXTERNAL TABLE hive_timestampupdate(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:name")
TBLPROPERTIES ("hbase.table.name" = "tablename");

When I queried the table hive_timestampupdate, I'm able to see the data in HBase table.

select * from hive_timestampupdate;

Here I want to query the data based on timestamp. Is there a way to query the data based on timestamp of HBase table?

解决方案

Unfortunately, no. According to the Hive HBase Integration document,

there is currently no way to access the HBase timestamp attribute, and queries always access data with the latest timestamp.

There are some JIRAs talking about timestamp related functionality, but they don't really do what you are asking, and they haven't gotten a great reception :(

这篇关于基于Time Stamp从Hive访问HBase表数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆