高效的过滤/搜索 [英] Efficient Filtering / Searching

查看:74
本文介绍了高效的过滤/搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个托管应用程序,用于管理内容页面.每个页面可以具有多个自定义字段和一些标准字段(时间戳,用户名,用户电子邮件等).

使用该系统可能有数百个不同的站点-处理过滤/搜索的有效方法是什么?画一个您想缩小的网格视图.您可以过滤特定字段(用户名,日期),也可以输入全文本搜索.

例如,所有由用户ID 10启动的页面"将是针对MySQL数据库的非常快速的查询.但是诸如所有由用户ID为10且与[某些搜索查询]相匹配的用户启动的页面"之类的事情对于数据库来说是很糟糕的,因此它适用于Lucene这样的搜索引擎.

基本上,我想知道其他大型网站是如何做这种事情的.他们是否将搜索引擎100%用于所有类型的过滤?他们会将数据库查询与搜索引擎混合在一起吗?

如果仅使用 搜索引擎,则新对象/更新对象出现在搜索索引中的延迟时间存在问题.也就是说,我已经读过,立即更新索引 并分批进行是不明智的.即使这意味着每5分钟一次,当用户查看一个简单的页面列表(例如对"category:5"的搜索查询)时,如果没有立即列出他们最近添加的页面,用户也会感到困惑.

我们正在使用MySQL,并一直在密切关注Lucene进行搜索.还有其他我不知道的技术吗?

我的想法是提供一个简单的过滤页面,该页面使用MySQL对基本字段进行过滤.然后提供一个单独的全文搜索页面,显示与Google类似的结果.这是唯一的方法吗?

解决方案

Solr或Grassyknoll都提供了到Lucene的抽象接口.

那是:是的.如果您是主要基于内容的网站,可以对数据进行全文搜索,那么LIKE之外还有其他功能.尽管MySql的FULLTEXT索引并不完美,但在过渡期间它可能是可接受的占位符.

假设您确实创建了Lucene索引,将Lucene文档链接到关系对象非常简单,只需在索引时将存储的属性添加到文档中即可(此属性可以是url,ID,GUID等),然后进行搜索成为两相系统: 1)向Lucene索引发布查询(显示简单的结果,例如标题) 2)通过其键从关系存储中获取有关该对象的更多详细信息

由于在Lucene中文档的实例化相对昂贵,因此您只想存储在Lucene索引中搜索的字段,而不是关系对象的完整克隆.

We have a hosted application that manages pages of content. Each page can have a number of customized fields, and some standard fields (timestamp, user name, user email, etc).

With potentially hundreds of different sites using the system -- what is an efficient way to handle filtering/searching? Picture a grid view that you want to narrow down. You can filter on specific fields (userid, date) or you can enter a full-text search.

For example, "all pages started by userid 10" would be a pretty quick query against a MySQL database. But things like "all pages started by a user whose userid is 10 and matches [some search query]" would suck against the database, so it's suited for a search engine like Lucene.

Basically I'm wondering how other large sites do this sort of thing. Do they utilize a search engine 100% for all types of filtering? Do they mix database queries with a search engine?

If we use only a search engine, there's a problem with the delay time it takes for a new/updated object to appear in the search index. That is, I've read that it's not smart to update the index immediately, and to do it in batches instead. Even if this means every 5 minutes, users will get confused when their recently added page isn't immediately listed when they view a simple page listing (say a search query of "category:5").

We are using MySQL and have been looking closely at Lucene for searching. Is there some other technology I don't know about?

My thought is to offer a simple filtering page which uses MySQL to filter on basic fields. Then offer a separate fulltext search page that would present results similar to Google. Is this the only way?

解决方案

Solr or grassyknoll both provide slightly more abstract interfaces to Lucene.

That said: Yes. If you are a primarily content driven site, providing fulltext searching over your data, there is something in play beyond LIKE. While MySql's FULLTEXT indexies aren't perfect, it might be an acceptable placeholder in the interim.

Assuming you do create a Lucene index, linking Lucene Documents to your relational objects is pretty straightforward, simply add a stored property to the document at index time (this property can be a url, ID, GUID etc.) Then, searching becomes a 2 phase system: 1) Issue query to Lucene indexies (Display simple results like title) 2) Get more detailed information about the object from your relational stores by its key

Since instantiation of Documents is relatively expensive in Lucene, you only want to store fields searched in the Lucene index, as opposed to complete clones of your relational objects.

这篇关于高效的过滤/搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆