在Lucene的同一存储位置的多个索引器 [英] Multiple indexers on same storage location in Lucene

查看:135
本文介绍了在Lucene的同一存储位置的多个索引器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想构建一个高度可扩展的应用程序,我打算使用Lucene作为我的搜索引擎库。浏览文档和常见问题时,我意识到它只允许一个索引编写器通过在索引目录中创建一些write.lock来在存储位置打开。我们可以在该索引上打开多个IndexReaders。



我有兴趣构建一个架构,其中有不少机器/服务器上运行的索引器数量以及多个搜索器在这些索引器创建的索引上回答各种类型的查询。搜索者和索引器都将在不同的计算机上运行。



在这种情况下,最好有多个索引器使用相同的索引存储位置对文档进行索引。如何实现这一点?我应该用像NFS(网络文件系统)这样的东西吗?这个问题是由Solr还是在Lucene之上的其他一些框架呢?我想到的一个明显的解决方案是为每个索引器创建一个索引,然后要求搜索者对多个索引目录进行查询。但是这些将导致大量不同的索引目录被创建,就像索引服务器一样多,我猜这是不太可取的。我想要(#的索引目录)<< (索引器的数量)< (#的搜索者)



在这种情况下,我有哪些替代方案?

解决方案首先:从来没有使用NFS与Lucene,只是缓慢而有风险。



如果涉及可扩展性和高可用性,我建议你只需要弹性搜索为你做所有的辛勤工作,这样你可以专注于你的数据。你当然可以有多个线程索引数据。



如果你想更多地了解弹性搜索的分布性质,我建议你看一下此视频


I want to build a highly scalable application where I intend to use Lucene as my search engine library. While browsing through the docs and faqs, I realize that it only allows one index writer to be open on a storage location by creating some write.lock in index directory. We can open multiple IndexReaders on that index.

I am interested in building an architecture where there are number of indexers running on different machines/servers and multiple searcher answering various types of queries on the indexes created by these indexers. Both searchers and indexers will be running on different computers.

In such scenario it will be preferable to have multiple indexers use same index storage location to index the documents. How to achieve this? Should I go with something like NFS (Networked File System)? Has this issue been taken care of by Solr or some other framework on top of Lucene? One obvious solution which comes to my mind is to create one index per indexer and then asking the searchers to make query across multiple index dirs. But these will lead to large number of different index dirs being created, as many as there are indexer servers, which I guess isn't much desirable. I want (# of index dirs) << (# of indexers) < (# of searchers)

What are the various alternatives do I have in this case?

解决方案

First of all: never use NFS with Lucene, it's simply slow and risky.

If it comes to scalability and high availability I'd suggest you to just let elasticsearch do all the hard work for you, so that you can concentrate on your data. You can of course have multiple threads indexing data.

If you want to know more about the distributed nature of elasticsearch I'd suggest you to have a look at this video.

这篇关于在Lucene的同一存储位置的多个索引器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆