仅在键+复合键&中的HBase数据通配符 [英] HBase Data only in Key + compound key & wildcards

查看:94
本文介绍了仅在键+复合键&中的HBase数据通配符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的问题:

1)您是否尝试过仅将所有数据放入行键中?我只有很少的数据行(但有数百万行),需要将更多的数据实体组合在一起以使密钥唯一,因此我的想法是使用我需要存储在HBase中的所有内容来创建复合密钥.您是否尝试过,您认为可能是瓶颈/问题?应该考虑什么?我可以想象这将需要更多的RAM,因为我会将更多的内容放入Bloom过滤器中.

1) Have you tried to put all the data only in the rowkey? I have really small rows (but millions of them) of data and need to combine more of the data entities together to make the key unique, so my the idea was to create a compound key using all the stuff I need to store in HBase. Have you tried it, what do you think might be the bottleneck/problem? What should be taken into consideration? I can imagine that this would need more RAM since I will have more stuff to put into the bloom filters.

2)我只想对此进行确认,因为我无法以这种形式找到它.就我得到的HBase而言,如果我有复合键,可以说: 密钥:k1_k2_x

2) I just want a confirmation for this, because I could not find it in this form. As far as I get HBase, if I have a compound key, lets say: key: k1_k2_x

我可以进行范围扫描以获取特定k1的所有k2条目,例如: 扫描"t1",{STARTROW =>"k1 _"}

I could do a range scan to get all k2 entries for a particular k1, for ex.: scan "t1",{STARTROW=>"k1_"}

但是没有办法使用通配符,并且以某种方式获取特定k2的所有k1条目.我需要一个map/reduce作业或Hive或一个过滤器,对吧?

but there is no way to use a wildcard and somehow get all k1 entries for a particular k2. I would need a map/reduce job or Hive or a filter for this, right?

推荐答案

1)将所有数据放入行键非常好. HBase旨在支持此类用例.

1) Its perfectly fine to put all your data in the row-key. HBase is designed to support use cases like this.

2)如果要对k1和k2都进行范围扫描(或通配符扫描"),建议将数据存储在两个表中,如下所示:

2) If you want to do range scans (or "wildcard scans") on both k1 and k2 I recommend storing the data in two tables like this:

  • 表1:k1_k2_x
  • 表2:k2_k1_x

这是重复的数据,但是对于执行所需的查询将非常有效.

This is duplicate data, but will be very efficient for doing the sort of queries you want.

这是使用HBase的权衡之一:您获得了非常大的扩展功能,但失去了RDBMS功能,并且需要找到通过行键结构进行插入/查询的有效方法.

This is one of the tradeoffs with HBase: you get really large scaling capabilities, but lose RDBMS features, and need to work out efficient ways of inserting/querying through your row-key structure.

这篇关于仅在键+复合键&中的HBase数据通配符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆