HBase数据仅用于Key +密钥&通配符 [英] HBase Data only in Key + compoung key & wildcards

查看:401
本文介绍了HBase数据仅用于Key +密钥&通配符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法在现有主题上找到这些内容,因此以下是我的问题:

1)您是否试图仅将所有数据放入rowkey ?我有非常小的行(但数以百万计)的数据,并且需要将更多的数据实体组合在一起以使关键字具有唯一性,所以我的想法是使用我需要存储在HBase中的所有东西创建复合关键字。你有没有尝试过,你认为可能是什么瓶颈/问题?应该考虑什么?我可以想象,这将需要更多的内存,因为我将有更多的东西放入bloom过滤器。

2)我只想要一个构造,因为我可以没有以这种形式找到它。至于我得到HBase,如果我有一个复合键,让我们说:
key:k1_k2_x



我可以进行范围扫描以获取所有k2条目对于特定的k1,例如:
扫描t1,{STARTROW =>k1 _}

但是没有办法使用通配符并以某种方式获取特定k2的所有k1条目。我需要一个map / reduce作业或Hive或过滤器,对吗?



感谢* strong text *提前和抱歉if有一个话题可以以某种形式回答问题。

干杯

解决方案

1)将所有数据放入行键非常好。 HBase旨在支持这样的用例。 2)如果你想在k1和k2上执行范围扫描(或通配符扫描),我建议将数据存储在两个表中,如下所示:

>


  • table1:k1_k2_x

  • 表2:k2_k1_x


这是重复的数据,但对于执行您想要的查询类型来说非常有效。



这是与HBase:你可以获得很大的扩展能力,但是会失去RDBMS特性,需要制定出有效的方式来插入/查询行键结构。

I was not able to find those stuff on the existing topics so here are my questions:

1) Have you tried to put all the data only in the rowkey? I have really small rows(but millions of them) of data and need to combine more of the data entities together to make the key unique, so my the idea was to create a compound key using all the stuff I need to store in HBase. Have you tried it, what do you think might be the bottleneck/problem? What should be taken into consideration? I can imagine that this would need more RAM since I will have more stuff to put into the bloom filters.

2)I just want a conformation for this, because I could not find it in this form. As far as I get HBase, if I have a compound key, lets say: key: k1_k2_x

I could do a range scan to get all k2 entries for a particular k1, for ex.: scan "t1",{STARTROW=>"k1_"}

but there is no way to use a wildcard and somehow get all k1 entries for a particular k2. I would need a map/reduce job or Hive or a filter for this, right?

Thanks*strong text* in advance and sorry if there is a topic at which the questions were answered in some form.

Cheers

解决方案

1) Its perfectly fine to put all your data in the row-key. HBase is designed to support use cases like this.

2) If you want to do range scans (or "wildcard scans") on both k1 and k2 I recommend storing the data in two tables like this:

  • table1: k1_k2_x
  • table2: k2_k1_x

This is duplicate data, but will be very efficient for doing the sort of queries you want.

This is one of the tradeoffs with HBase: you get really large scaling capabilities, but lose RDBMS features, and need to work out efficient ways of inserting/querying through your row-key structure.

这篇关于HBase数据仅用于Key +密钥&通配符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆