Lucene - 它是巨大索引的正确答案吗? [英] Lucene - is it the right answer for huge index?

查看:12
本文介绍了Lucene - 它是巨大索引的正确答案吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Lucene 是否能够索引 500M 的文本文档,每个文档大小为 50K?

Is Lucene capable of indexing 500M text documents of 50K each?

对于单项搜索和 10 项搜索,此类索引的预期性能如何?

What performance can be expected such index, for single term search and for 10 terms search?

我应该担心并直接转移到分布式索引环境吗?

Should I be worried and directly move to distributed index environment?

萨尔

推荐答案

是的,Lucene 应该可以处理这个问题,根据下面的文章:http://www.lucidimagination.com/content/scaling-lucene-and-solr

Yes, Lucene should be able to handle this, according to the following article: http://www.lucidimagination.com/content/scaling-lucene-and-solr

这是一个引用:

根据多种因素,单台机器可以轻松托管 5 到 80+ 百万份文档的 Lucene/Solr 索引,而分布式解决方案可以在数十亿份文档中提供亚秒级的搜索响应时间.

Depending on a multitude of factors, a single machine can easily host a Lucene/Solr index of 5 – 80+ million documents, while a distributed solution can provide subsecond search response times across billions of documents.

本文深入探讨了如何扩展到多台服务器.因此,您可以从小处着手,并在需要时进行扩展.

The article goes into great depth about scaling to multiple servers. So you can start small and scale if needed.

关于 Lucene 性能的一个很好的资源是 Mike McCandless 的博客,他积极参与了 Lucene 的开发:http://blog.mikemccandless.com/他经常使用 Wikipedia 的内容 (25 GB) 作为 Lucene 的测试输入.

A great resource about Lucene's performance is the blog of Mike McCandless, who is actively involved in the development of Lucene: http://blog.mikemccandless.com/ He often uses Wikipedia's content (25 GB) as test input for Lucene.

此外,Twitter 的实时搜索现在使用 Lucene 实现可能会很有趣(请参阅 http://engineering.twitter.com/2010/10/twitters-new-search-architecture.html).

Also, it might be interesting that Twitter's real-time search is now implemented with Lucene (see http://engineering.twitter.com/2010/10/twitters-new-search-architecture.html).

但是,我想知道您提供的数字是否正确:5 亿文档 x 50 KB = ~23 TB -- 您真的有这么多数据吗?

However, I am wondering if the numbers you provided are correct: 500 million documents x 50 KB = ~23 TB -- Do you really have that much data?

这篇关于Lucene - 它是巨大索引的正确答案吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆