数以百万计的记录中包含文本的最佳搜索方式. [英] Optimal way to search about contain text in million of records.

查看:171
本文介绍了数以百万计的记录中包含文本的最佳搜索方式.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用索引抛出Lucene.net,但是lucene.net可以为您提供结果的最长时间为6分钟.
我的网站基于搜索效果,我需要其他策略或方法来使用它们.
如果有人知道另一种方式(lucene.net),则可以搜索上百万条记录中的文本.

谢谢

I used indexing throw Lucene.net, but the max time that lucene.net can give you the result is 6 minute.
My website based on the search performance, i need another strategy or methodology to using them.
If someone know another way else (lucene.net) to search about text in million of records.

Thank you

推荐答案

如果使用数据库管理器的内置搜索功能将文本记录存储在数据库中,则速度可能会快很多.毕竟,这就是构建的目的.

如果您从硬盘上的文件中读取文本,则可能是您的实现.我对Lucene以及它的工作方式并不熟悉,但是如果通过每个查询查看这些文件,每次打开和关闭文件都会有很多开销.

我个人曾经写过一个对txt文件建立索引的搜索系统,但是由于它们的数量非常少,内容也很少,因此我只是将文件读取为一个长字符串,然后将其转储到SQLServer中.性能非常好,但是如果您要索引100万个PDF,我认为这不是最佳选择.

如果可以的话,您可以进一步研究系统的细节:文本来自哪里,存储在哪里,每条记录中有多少文本,诸如此类的东西,我们也许能够提供更好,更直接使用的东西,建议.
If the text records are in a database using the database manager''s build in search functionality will probably be a lot faster. After all, that is what is build for.

If you read the text from files on the hard disk it might be your implementation. I am not familiar with Lucene and how it works, but if it looks through those files each query there is a lot of overhead in opening and closing the files each time.

I personally once wrote a search system that indexed txt files, but since the number of them was quite small, as was the amount of content, I just read the file as one long string and dumped it into SQLServer. Performance was very well, but I don''t think this will be the most optimal strategy if you have 1M PDFs to index.

if you could you go into some more detail of the system: where does the text come from, where is it stored, how much text is in each record, that sort of thing, we might be able to provide some better, more directly usable, advice.


这篇关于数以百万计的记录中包含文本的最佳搜索方式.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆