分析后如何读取 Lucene 文档字段标记? [英] How can I read a Lucene document field tokens after they are analyzed?
问题描述
如果我创建一个文档并添加一个既可存储又可分析的字段,我如何才能将该字段作为令牌列表读回?我有以下内容:
If I create a document and add a field that is both stored and analyzed, how can I then read this field back as a list of tokens? I have the following:
Document doc = new Document();
doc.add(new Field("url", fileName, Store.YES, Index.NOT_ANALYZED));
doc.add(new Field("text", fileContent, Store.YES, Index.ANALYZED));
// add the document to the index
writer.addDocument(doc);
所以 fileContext 是一个包含大量文本的字符串.对其进行分析,从而在将其存储在索引中时对其进行标记化.但是,我怎样才能获得这些令牌?我可以在存储后从索引中检索文档,并且可以从文档中读取文本"字段,但这是以字符串形式返回的.如果可能的话,我想获得代币.我的作家"是一个 IndexWriter 实例,它使用 StandardAnalyzer.任何指针都会非常受欢迎.
So the fileContext is a String containing a lot of text. It is analyzed whereby it is tokenized when it is stored in the index. However, how can I get these tokens? I can retrieve the document from the index after it is stored, and I can read the "text" field from the document, but this is returned as a string. I would like to get the tokens if possible. My 'writer' is an IndexWriter instance and it uses a StandardAnalyzer. Any pointers would be very much welcomed.
非常感谢
推荐答案
查看document.getField("name").tokenStreamValue()
.
实际上这个问题给你使用上述 TokenStream
的完整解决方案.
Actually this question gives you the full solution using the above TokenStream
.
这篇关于分析后如何读取 Lucene 文档字段标记?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!