分析后如何读取 Lucene 文档字段标记? [英] How can I read a Lucene document field tokens after they are analyzed?

查看:18
本文介绍了分析后如何读取 Lucene 文档字段标记?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我创建一个文档并添加一个既可存储又可分析的字段,我如何才能将该字段作为令牌列表读回?我有以下内容:

If I create a document and add a field that is both stored and analyzed, how can I then read this field back as a list of tokens? I have the following:

            Document doc = new Document();
            doc.add(new Field("url", fileName, Store.YES, Index.NOT_ANALYZED));
            doc.add(new Field("text", fileContent, Store.YES, Index.ANALYZED));
            // add the document to the index
            writer.addDocument(doc);

所以 fileContext 是一个包含大量文本的字符串.对其进行分析,从而在将其存储在索引中时对其进行标记化.但是,我怎样才能获得这些令牌?我可以在存储后从索引中检索文档,并且可以从文档中读取文本"字段,但这是以字符串形式返回的.如果可能的话,我想获得代币.我的作家"是一个 IndexWriter 实例,它使用 StandardAnalyzer.任何指针都会非常受欢迎.

So the fileContext is a String containing a lot of text. It is analyzed whereby it is tokenized when it is stored in the index. However, how can I get these tokens? I can retrieve the document from the index after it is stored, and I can read the "text" field from the document, but this is returned as a string. I would like to get the tokens if possible. My 'writer' is an IndexWriter instance and it uses a StandardAnalyzer. Any pointers would be very much welcomed.

非常感谢

推荐答案

查看document.getField("name").tokenStreamValue().

实际上这个问题给你使用上述 TokenStream 的完整解决方案.

Actually this question gives you the full solution using the above TokenStream.

这篇关于分析后如何读取 Lucene 文档字段标记?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆