从Lucene索引文件中计算出Analyzer,Version等? [英] Work out Analyzer, Version, etc. from Lucene index files?

查看:83
本文介绍了从Lucene索引文件中计算出Analyzer,Version等?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

只需仔细检查一下:我认为这是不可能的,并且如果您想以某种方式将此类信息与索引目录中的索引文件捆绑在一起,则必须找出一种自己解决的方法.

Just double-checking on this: I assume this is not possible and that if you want to keep such info somehow bundled up with the index files in your index directory you have to work out a way to do it yourself.

很显然,您可能对不同的目录使用了不同的分析器,并且在99%的时间中,构造QueryParser时使用正确的分析器非常重要:如果您的QP不同,则可能会出现各种错误.结果.

Obviously you might be using different Analyzers for different directories, and 99% of the time it is pretty important to use the right one when constructing a QueryParser: if your QP has a different one all sorts of inaccuracies might crop up in the results.

同样,据我所知,获取错误版本的索引文件可能不会导致完全失败:再次,您可能会获得不正确的结果.

Equally, getting the wrong Version of the index files might, for all I know, not result in a complete failure: again, you might instead get inaccurate results.

我想知道Lucene人民是否曾经考虑过将这种信息与索引文件捆绑在一起?同样,我想知道是否有人知道像Elasticsearch这样的Lucene衍生应用程序中是否确实包含这样的机制?

I wonder whether the Lucene people have ever considered bundling up this sort of info with the index files? Equally I wonder if anyone knows whether any of the Lucene derivative apps, like Elasticsearch, maybe do incorporate such a mechanism?

实际上,仅在索引的"_0"文件(_0.cfe,_0.cfs和_0.si)中查找,所有这3个文件实际上都包含单词"Lucene",后面似乎是版本信息.嗯...

Actually, just looking inside the "_0" files (_0.cfe, _0.cfs and _0.si) of an index, all 3 do actually contain the word "Lucene" seemingly followed by version info. Hmmm...

PS 发生的其他相关想法:表示您正在索引某种文本文档(或1000个文档)...,并且希望每次索引都保持最新打开.一种明显的方法是将各个文件的上次修改日期与上次更新索引的时间进行比较:任何过时的文档都需要将与它们有关的信息从索引中删除,然后必须重新编制索引.

PS other related thoughts which occur: say you are indexing a text document of some kind (or 1000 documents)... and you want to keep your index up-to-date each time it is opened. One obvious way to do this would be to compare the last-modified date of individual files with the last time the index was updated: any documents which are now out-of-date would need to have info pertaining to them removed from the index, and then have to be re-indexed.

与Lucene索引有关的这种需求必须始终存在.如果没有正确的索引文件中包含的有用元信息",通常如何解决?

This need must occur all the time in connection with Lucene indices. How is it generally tackled in the absence of helpful "meta info" included in with the index files proper?

推荐答案

对此问题感兴趣的人:

根据我所说的,Version确实包含在索引文件中.我查看了CheckIndex类以及从中可以获得的各种信息,例如CheckIndex.Status.SegmentInfoStatus,而没有找到获取Version的方法.我开始假设这是故意的,其想法只是让Lucene根据需要处理索引的更新.如果这样的话,那不是一个完全令人满意的状况...

It does appear from what I said that the Version is contained in the index files. I looked at the CheckIndex class and the various info you can get from that, e.g. CheckIndex.Status.SegmentInfoStatus, without finding a way to obtain the Version. I'm starting to assume this is deliberate, and that the idea is just to let Lucene handle the updating of the index as required. Not an entirely satisfactory state of affairs if so...

对于获取其他内容,例如Analyzer类,如果您想...似乎必须自己实现这种元数据"的东西...这可以通过仅在其中包含一个文本文件来完成.其他文件,或者可以使用IndexData类显示.当然,您的Version也可以用这种方式存储.

As for getting other things, such as the Analyzer class, it appears you have to implement this sort of "metadata" stuff yourself if you want to... this could be done by just including a text file in with the other files, or alternately it appears you can use the IndexData class. Of course your Version could also be stored this way.

有关编写此类信息的信息,请参见IndexWriter.setCommitData().

For writing such info, see IndexWriter.setCommitData().

要检索此类信息,您必须使用IndexReader的几个(?)子类之一,例如DirectoryReader.

For retrieving such info, you have to use one of several (?) subclasses of IndexReader, such as DirectoryReader.

这篇关于从Lucene索引文件中计算出Analyzer,Version等?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆