Lucene 是如何工作的 [英] How does Lucene work

查看:22
本文介绍了Lucene 是如何工作的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道 lucene 搜索是如何运行得如此之快的.我在网上找不到任何有用的文档.如果您有任何东西(缺少 lucene 源代码)要阅读,请告诉我.

I would like to find out how lucene search works so fast. I can't find any useful docs on the web. If you have anything (short of lucene source code) to read, let me know.

在我的例子中,使用带有索引的 mysql5 文本搜索的文本搜索查询大约需要 18 分钟.对同一查询的 lucene 搜索不到一秒钟.

A text search query using mysql5 text search with index takes about 18 minutes in my case. A lucene search for the same query takes less than a second.

推荐答案

Lucene 是一个倒排全文索引.这意味着它获取所有文档,将它们拆分为单词,然后为每个单词构建一个索引.由于索引是一个精确的字符串匹配,无序,它可以非常快.假设,varchar 字段上的 SQL 无序索引可能同样快,实际上我认为您会发现大型数据库在这种情况下可以非常快速地执行简单的字符串相等查询.

Lucene is an inverted full-text index. This means that it takes all the documents, splits them into words, and then builds an index for each word. Since the index is an exact string-match, unordered, it can be extremely fast. Hypothetically, an SQL unordered index on a varchar field could be just as fast, and in fact I think you'll find the big databases can do a simple string-equality query very quickly in that case.

Lucene 不必针对事务处理进行优化.添加文档时,它不需要确保查询立即看到它.而且它不需要针对现有文档的更新进行优化.

Lucene does not have to optimize for transaction processing. When you add a document, it need not ensure that queries see it instantly. And it need not optimize for updates to existing documents.

但是,归根结底,如果您真的想知道,则需要阅读源代码.毕竟,您提到的两件事都是开源的.

However, at the end of the day, if you really want to know, you need to read the source. Both things you reference are open source, after all.

这篇关于Lucene 是如何工作的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆