HBase为Scala/Java API中的同一行键存储具有2个或更多值的特定列的数据 [英] HBase storing data for a particular column with 2 or more values for the same row-key in Scala/Java API
问题描述
我有一个包含以下内容的文件:
I have a file with following contents:
UserID Email
1001 abc@yahoo.com
1001 def@gmail.com
1002 gft@gmail.com
1002 rtf@yahoo.com
我想这样存储数据:
ROW COLUMN+CELL
1001 column=cf:Email, timestamp=1487917201278, value=abc@yahoo.com
1001 column=cf:Email, timestamp=1487917201279, value=def@gmail.com
1002 column=cf:Email, timestamp=1487917201286, value=gft@gmail.com
1002 column=cf:Email, timestamp=1487917201287, value=rtf@yahoo.com
我正在使用Put
例如:put 'table', '1001', 'cf:Email', 'def@gmail.com'
,但它给了我
I am using Put
for example: put 'table', '1001', 'cf:Email', 'def@gmail.com'
but it is giving me
ROW COLUMN+CELL
1001 column=cf:Email, timestamp=1487917201279, value=def@gmail.com
1002 column=cf:Email, timestamp=1487917201286, value=rtf@yahoo.com
它覆盖了先前的值.但是HBase应该基于时间戳存储特定列的多个值. 无论如何,我可以为特定的UserID存储两个电子邮件地址吗?
It is overriding the previous value. But HBase supposed to store multiple values for a particular column based on timestamp. Is there anyway that I can store both email addresses for particular UserID?
推荐答案
You may want to take a closer look at the HBase documentation on versions. Note especially where it says
默认情况下,即,如果您未指定任何显式版本,则在执行
get
时,将返回其版本值最大的单元格
By default, i.e. if you specify no explicit version, when doing a
get
, the cell whose version has the largest value is returned
但是我不会追求使用多个版本以这种方式存储多个值.您必须明确指定最大版本数,并将其应用于该系列中的每一列.我会更倾向于使用不同的列名(例如Email1
,Email2
,...)
But I wouldn't pursue using multiple versions to store multiple values this way. You have to explicitly specify the maximum number of versions and it will apply to every column in that family. I would be more inclined to use distinct column names (such as Email1
, Email2
, ...)
这篇关于HBase为Scala/Java API中的同一行键存储具有2个或更多值的特定列的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!