有Lucene的HTML分析器/标记器吗? [英] Is there a HTML analyzer/tokenizer for Lucene?

查看：56 发布时间：2020/5/4 7:33:43 lucene

本文介绍了有Lucene的HTML分析器/标记器吗?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我想在Lucene中从html索引文本，实现此目的的最佳方法是什么?
在Lucene中有什么好的Contrib模块可以做到这一点吗?

I wanted to index text from html, in Lucene, what is the best way to achieve this ?
Is there any good Contrib module that can do this in Lucene ?

编辑
最终使用Jericho Parser.它不创建DOM，并且易于使用.

EDIT
Finally ended up using Jericho Parser. It doesn't create DOM and is easy to use.