SQL Server 2008全文搜索(FTS)与Lucene.NET [英] SQL Server 2008 Full Text Search (FTS) versus Lucene.NET

查看:260
本文介绍了SQL Server 2008全文搜索(FTS)与Lucene.NET的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道过去一直存在关于SQL 2005和Lucene.NET的问题,但自从2008年问世以来,他们对它进行了很多更改,并想知道是否有人可以给我优点/缺点(或链接到文章)。

解决方案

我在SQL Server 2005的FTS之上构建了一个中等规模的知识库(可能是2GB索引文本)现在已经转移到2008年的iFTS。这两种情况对我来说都很好,但从2005年到2008年的举措对我来说实际上是一种改进。



我的情况不像StackOverflow那样,数据只是每晚刷新,但我试图将多个CONTAINSTABLE语句的搜索结果返回给对方和关系表。



在2005年的FTS中,这意味着每个CONTAINSTABLE都必须在索引上执行搜索,返回完整的结果,然后让数据库引擎将这些结果加入到关系表中(这对我来说都是透明的,但这种情况正在发生并且对查询而言很昂贵)。 2008年的iFTS改善了这种情况,因为数据库集成允许多个CONTAINSTABLE结果成为查询计划的一部分,这使得大量搜索更加高效。

我认为2005年和2008年的FTS引擎以及Lucene.NET都有架构上的权衡,这些权衡会在很多项目环境中与更好或更差一致 - 我很幸运,升级对我有利。我可以完全明白为什么2008年的iFTS不能像2005年那样在StackOverflow.com这种用例的高度OLTP性质上运行。然而,我不会打折2008年iFTS可能与繁重的插入事务负载隔离的可能性......但是这听起来像完成转移到Lucene.NET一样多的工作......以及酷因为Lucene.NET的因素很难忽略;)

无论如何,对于我来说,在大多数情况下SQL 2008的iFTS的简易性和效率可能会使Lucene的'cool '因素(虽然它很容易使用,但我从未在生产系统中使用它,所以我保留对此的评论)。我会很有兴趣知道在StackOverflow或类似的情况下,Lucene是多么高效(现在已经实现了吗?)。

I know there have been questions in the past about SQL 2005 versus Lucene.NET but since 2008 came out and they made a lot of changes to it and was wondering if anyone can give me pros/cons (or link to an article).

解决方案

I built a medium-size knowledge base (maybe 2GB of indexed text) on top of SQL Server 2005's FTS in 2006, and have now moved it to 2008's iFTS. Both situations have worked well for me, but the move from 2005 to 2008 was actually an improvement for me.

My situation was NOT like StackOverflow's in the sense that I was indexing data that was only refreshed nightly, however I was trying to join search results from multiple CONTAINSTABLE statements back in to each other and to relational tables.

In 2005's FTS, this meant each CONTAINSTABLE would have to execute its search on the index, return the full results and then have the DB engine join those results to the relational tables (this was all transparent to me, but it was happening and was expensive to the queries). 2008's iFTS improved this situation because the database integration allows the multiple CONTAINSTABLE results to become part of the query plan which made a lot of searches more efficient.

I think that both 2005 and 2008's FTS engines, as well as Lucene.NET, have architectural tradeoffs that are going to align better or worse to a lot of project circumstances - I just got lucky that the upgrade worked in my favor. I can completely see why 2008's iFTS wouldn't work in the same configuration as 2005's for the highly OLTP nature of a use case like StackOverflow.com. However, I would not discount the possibility that the 2008 iFTS could be isolated from the heavy insert transaction load... but it also sounds like it could be as much work to accomplish that as move to Lucene.NET ... and the cool factor of Lucene.NET is hard to ignore ;)

Anyway, for me, the ease and efficiency of SQL 2008's iFTS in the majority of situations probably edges out Lucene's 'cool' factor (though it is easy to use, I've never used it in a production system so I'm reserving comment on that). I would be interesting in knowing how much more efficient Lucene is (has turned out to be? is it implemented now?) in StackOverflow or similar situations.

这篇关于SQL Server 2008全文搜索(FTS)与Lucene.NET的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆