使用Solr查询HBase [英] Using Solr to Query HBase

查看:146
本文介绍了使用Solr查询HBase的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据仓库问题,需要通过大型数据集进行查询。为了这个例子,我们可以说一个典型的状态将会有三千万用户拥有每个用户的活动统计信息。理想情况下,我可以购买数据仓库工具(Vertica,Infobright等),但这并不是卡片或预算。

现在我正在考虑使用Solr来查询HBase。尽管我相信HBase可以满足需求,但我担心Solr。它作为搜索引擎进行了优化,即结果的第一页在最后一页返回,并且不支持数据库游标等内容。迄今为止的测试表明,从Solr获得大量结果比我想要的要慢。例如,在一分钟内比较一个能够检索Infobright社区版本中一半可用用户(最终返回500 MB数据的用户)的查询,Solr花费了12分钟。

除了Solr之外,还有什么更适合查询这些数据吗?有没有可以帮助批量数据输入和输出的优化?

解决方案

我知道这有点晚了,但... ...

根据您的搜索要求,Solr可能是一个不错的选择。请记住,你很可能不需要在HBase中索引所有内容。是否有某些领域可以挑选出来?文字的部分?如果您已经将它存储在HBase中,那么您绝对不需要将这些东西存储在Solr中。



Solr是一个优秀的二级索引系统,可以放在HBase之上,如果这是您所需要的,Solr也具有一些出色的文本分析功能。您还应该看看Solr的主要竞争对手之一ElasticSearch。

>

I have a data warehousing problem, needing to query over a large dataset. For the sake of this example lets say a typical state would have 30 million users with activity stats for each. Ideally I could buy a data warehousing tool (Vertica, Infobright, etc...) but that's not in the cards or the budget.

Right now I'm considering using Solr to query HBase. While I believe HBase could scale up to the needs, I worry about Solr. It's optimized as a search engine, i.e. the first pages of results return before the last and there's no support for something like a database cursor. Tests so far have shown that getting a large result set out of Solr have been slower than I would've liked. For instance comparing a query that would retrieve half of the available users (one which ultimately returned 500 mb of data) in the community version of Infobright finished in under a minute, for Solr it took 12 minutes.

Is there something other than Solr that's better suited to query this data? Are there any optimizations that would help with bulk data input and output?

解决方案

I know this is a bit late but...

Depending on your search requirements Solr could be a good option. Keep in mind you most likely won't need to index everything in HBase. Are there certain fields you can pick out? Portions of text? You most certainly do NOT need to store this stuff in Solr if you're already storing it in HBase.

Solr is an excellent secondary index system to put on top of HBase, and Solr also has some great text analytics capabilities if that is what you need.

You should also take a look at ElasticSearch, one of Solr's primary competitors.

这篇关于使用Solr查询HBase的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆