在关系数据库中有效地实现分面搜索 [英] Efficient implementation of faceted search in relational databases

查看:155
本文介绍了在关系数据库中有效地实现分面搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试实施分面搜索或使用多标记过滤进行标记。在分面导航中,仅显示非空类别,并且类别中与已应用条件匹配的项目数量显示在括号中。

I am trying to implement a Faceted search or tagging with multiple-tag filtering. In the faceted navigation, only not-empty categories are displayed and the number of items in the category that are also matching already applied criteria is presented in parenthesis.

我可以获取所有指定类别的项目使用INNER JOINs 使用COUNT和GROUP BY获取所有类别中的项目数< a>,但是我不知道它将如何扩展到数百万的对象和成千上万的标签。特别是计数。

I can get all items having assigned categories using INNER JOINs and get number of items in all category using COUNT and GROUP BY, however I'm not sure how it will scale to millions of objects and thousands of tags. Especially the counting.

我知道有一些非关系解决方案,例如 Lucene + SOLR ,但我还发现了一些基于RDBMS的封闭源代码实现,它们被称为企业级实力,如 FacetMap.com Endeca 软件,因此必须有一种有效的方式在关系型数据库。

I know that there are some not-relational solutions like Lucene + SOLR, but I've found also some closed-source RDBMS-based implementations that are said to be entreprise-strength like FacetMap.com or Endeca software, so there must be an efficient way to perform faceted search in relational databases.

有没有人有分面搜索的经验,可以给一些提示?

Does anybody have experience in faceted search and could give some tips?

缓存每个类别集的计数?也许使用一些智能增量技术,将更新计数器?

Cache the counts for each category set? Maybe use some smart incremental technique that will update the counters?

编辑:

分面导航的示例可以在这里找到: Flamenco

An example of faceted navigation can be found here: Flamenco.

目前我具有标准的3表方案(items,tags和items_tags,如下所述: http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html#toxi )以及面的表。

Currently I have the standard 3-table scheme (items, tags and items_tags like described here: http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html#toxi ) plus a table for facets. Each tag has assigned a facet.

推荐答案

我只能确认Nils说什么。 RDBMS不适合多维搜索。我已经使用了一些智能解决方案,缓存计数器,使用触发器等等。但是最终,外部专用索引器总是胜利。

I can only confirm what Nils says. RDBMS are not good for multi-dimensional searching. I have worked with some smart solutions, caching counters, using triggers, and so on. But in the end, external dedicated indexer always wins.

MAYBE,如果将数据转换为维度模型并将其转换为某些OLAP [我的意思是MDX引擎]将表现良好。

MAYBE, if you transform your data into dimensional model and feed it to some OLAP [I mean MDX engine] - it will perform well. But it seems a bit too heavy solution, and it will be definitely NOT real-time.

相反,带有专用索引引擎的解决方案(认为Lucene,认为 Sphinx )可以通过增量索引更新实现近实时。

On the contrary, solution with dedicated indexing engine (think Lucene, think Sphinx) can be made near-real time with incremental index updates.

这篇关于在关系数据库中有效地实现分面搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆