基于Time Stamp从Hive访问HBase表数据 [英] Accessing HBase table data from Hive based on Time Stamp
问题描述
我已经创建了一个HBase,提供默认版本为10
create'tablename',{NAME => 'cf',VERSIONS => 10}
并插入两行(第1行和第2行)
放置'tablename','row1','cf:id','row1id'
放'tablename','row1','cf:name' ,'row1name'
放'tablename','row2','cf:id','row2id'
放'tablename','row2','cf:name','row2name'
把'tablename','row2','cf:name','row2nameupdate'
放'tablename','row2','cf:name','row2nameupdateagain'
放'tablename' ,'row2','cf:name','row2nameupdateonemoretime'
尝试使用扫描选择数据
扫描'tablename',{RAW =>真,VERSIONS => 10}
我可以看到所有版本的数据。
现在创建一个Hive External表来指向这个HBase表
$ CREATE EXTERNAL TABLE hive_timestampupdate(key int ,值字符串)
STORED BY'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES(hbase.columns.mapping=:key,cf:name)
TBLPROPERTIES(hbase.table.name=tablename);
当我查询表 hive_timestampupdate
时,I能够看到HBase表中的数据。
select * from hive_timestampupdate;
这里我想根据时间戳查询数据。有没有一种方法可以根据HBase表的时间戳来查询数据?
不幸的是,没有。根据 Hive HBase集成文档,
目前没有办法访问HBase timestamp属性,并且查询总是使用最新的时间戳访问数据。
有些JIRA讨论与时间戳有关的功能,但他们并不真正按照你的要求去做,而且他们也没有得到很好的接待:($ / b $ b
I have created a HBase by mentioning the default versions as 10
create 'tablename',{NAME => 'cf', VERSIONS => 10}
and inserted two rows(row1 and row2)
put 'tablename','row1','cf:id','row1id'
put 'tablename','row1','cf:name','row1name'
put 'tablename','row2','cf:id','row2id'
put 'tablename','row2','cf:name','row2name'
put 'tablename','row2','cf:name','row2nameupdate'
put 'tablename','row2','cf:name','row2nameupdateagain'
put 'tablename','row2','cf:name','row2nameupdateonemoretime'
Tried to select the data using scan
scan 'tablename',{RAW => true, VERSIONS => 10}
I'm able to see all the versions data.
Now created a Hive External table to point to this HBase table
CREATE EXTERNAL TABLE hive_timestampupdate(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:name")
TBLPROPERTIES ("hbase.table.name" = "tablename");
When I queried the table hive_timestampupdate
, I'm able to see the data in HBase table.
select * from hive_timestampupdate;
Here I want to query the data based on timestamp. Is there a way to query the data based on timestamp of HBase table?
Unfortunately, no. According to the Hive HBase Integration document,
there is currently no way to access the HBase timestamp attribute, and queries always access data with the latest timestamp.
There are some JIRAs talking about timestamp related functionality, but they don't really do what you are asking, and they haven't gotten a great reception :(
这篇关于基于Time Stamp从Hive访问HBase表数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!