使用SQL Server配置Lucene.Net [英] Configure Lucene.Net with SQL Server

查看:108
本文介绍了使用SQL Server配置Lucene.Net的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人使用Lucene.NET而不是使用SQL Server附带的全文本搜索吗?

Has anyone used Lucene.NET rather than using the full text search that comes with sql server?

如果是这样,我会对您如何实现它感兴趣.

If so I would be interested on how you implemented it.

例如,您是否编写了一个Windows服务,该服务每小时查询一次数据库,然后将结果保存到lucene.net索引中?

Did you for example write a windows service that queried the database every hour then saved the results to the lucene.net index?

推荐答案

是的,我已经将它用于您所描述的内容.我们有两种服务-一种用于读取,一种用于写入,但这仅是因为我们有多个读者.我敢肯定,仅使用一项服务(作者)就可以完成这项工作,并将阅读器嵌入到Web应用程序和服务中.

Yes, I've used it for exactly what you are describing. We had two services - one for read, and one for write, but only because we had multiple readers. I'm sure we could have done it with just one service (the writer) and embedded the reader in the web app and services.

我已经使用lucene.net作为一般的数据库索引器,所以我得到的基本上是数据库ID(用于索引电子邮件),并且我还使用它来获取足够的信息以填充搜索结果等.无需接触数据库.在这两种情况下,它的工作都非常好,这可能会使SQL变慢,因为您几乎必须获得一个ID,选择一个ID等.我们通过制作一个临时表(其中仅包含ID行)来解决此问题,并且从文件批量插入(这是Lucene的输出),然后加入到消息表中.快很多了.

I've used lucene.net as a general database indexer, so what I got back was basically DB id's (to indexed email messages), and I've also use it to get back enough info to populate search results or such without touching the database. It's worked great in both cases, tho the SQL can get a little slow, as you pretty much have to get an ID, select an ID etc. We got around this by making a temp table (with just the ID row in it) and bulk-inserting from a file (which was the output from lucene) then joining to the message table. Was a lot quicker.

Lucene并不是完美的,并且您必须在关系数据库之外进行一些思考,因为它完全不是一个,但是它的作用非常出色.值得一看,并且被告知,没有MS SQL的FTI所存在的糟糕,抱歉,您需要再次重建索引"问题.

Lucene isn't perfect, and you do have to think a little outside the relational database box, because it TOTALLY isn't one, but it's very very good at what it does. Worth a look, and, I'm told, doesn't have the "oops, sorry, you need to rebuild your index again" problems that MS SQL's FTI does.

顺便说一句,我们正在处理205,000万封电子邮件(以及大约100万个唯一附件),我认为总计约20GB的lucene索引,以及250 + GB的SQL数据库+附件.

BTW, we were dealing with 20-50million emails (and around 1 million unique attachments), totaling about 20GB of lucene index I think, and 250+GB of SQL database + attachments.

至少可以说,性能非常好-只要确保考虑并调整合并因子(合并索引段时).一个以上的段没有问题,但是如果您尝试合并两个段,每个段有100万个项目,并且观察者线程会花费很长时间杀死进程,那么可能会出现BIG问题. ..(是的,这把我们踢了一会儿).因此,将每个东西的最大文档数保持为低(即,不要像我们一样将其设置为maxint!)

Performance was fantastic, to say the least - just make sure you think about, and tweak, your merge factors (when it merges index segments). There is no issue in having more than one segment, but there can be a BIG problem if you try to merge two segments which have 1mil items in each, and you have a watcher thread which kills the process if it takes too long..... (yes, that kicked our arse for a while). So keep the max number of documents per thinggie LOW (ie, dont set it to maxint like we did!)

EDIT Corey Trager在BugTracker.NET中记录了如何使用Lucene.NET

EDIT Corey Trager documented how to use Lucene.NET in BugTracker.NET here.

这篇关于使用SQL Server配置Lucene.Net的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆