HBase为Scala/Java API中的同一行键存储具有2个或更多值的特定列的数据 [英] HBase storing data for a particular column with 2 or more values for the same row-key in Scala/Java API

查看:163
本文介绍了HBase为Scala/Java API中的同一行键存储具有2个或更多值的特定列的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含以下内容的文件:

I have a file with following contents:

UserID   Email             
1001     abc@yahoo.com     
1001     def@gmail.com     
1002     gft@gmail.com
1002     rtf@yahoo.com

我想这样存储数据:

ROW          COLUMN+CELL                                                                                   
1001         column=cf:Email, timestamp=1487917201278, value=abc@yahoo.com 
1001         column=cf:Email, timestamp=1487917201279, value=def@gmail.com                                                                                                
1002         column=cf:Email, timestamp=1487917201286, value=gft@gmail.com
1002         column=cf:Email, timestamp=1487917201287, value=rtf@yahoo.com

我正在使用Put例如:put 'table', '1001', 'cf:Email', 'def@gmail.com',但它给了我

I am using Put for example: put 'table', '1001', 'cf:Email', 'def@gmail.com' but it is giving me

ROW          COLUMN+CELL                                                                                    
1001         column=cf:Email, timestamp=1487917201279, value=def@gmail.com                                                                                                
1002         column=cf:Email, timestamp=1487917201286, value=rtf@yahoo.com

它覆盖了先前的值.但是HBase应该基于时间戳存储特定列的多个值. 无论如何,我可以为特定的UserID存储两个电子邮件地址吗?

It is overriding the previous value. But HBase supposed to store multiple values for a particular column based on timestamp. Is there anyway that I can store both email addresses for particular UserID?

推荐答案

您可能需要仔细看看

You may want to take a closer look at the HBase documentation on versions. Note especially where it says

默认情况下,即,如果您未指定任何显式版本,则在执行get时,将返回其版本值最大的单元格

By default, i.e. if you specify no explicit version, when doing a get, the cell whose version has the largest value is returned

但是我不会追求使用多个版本以这种方式存储多个值.您必须明确指定最大版本数,并将其应用于该系列中的每一列.我会更倾向于使用不同的列名(例如Email1Email2,...)

But I wouldn't pursue using multiple versions to store multiple values this way. You have to explicitly specify the maximum number of versions and it will apply to every column in that family. I would be more inclined to use distinct column names (such as Email1, Email2, ...)

这篇关于HBase为Scala/Java API中的同一行键存储具有2个或更多值的特定列的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆