是否有用于 Lucene 的 HTML 分析器/标记器? [英] Is there a HTML analyzer/tokenizer for Lucene?

查看：18 发布时间：2022/1/15 13:13:22 lucene

本文介绍了是否有用于 Lucene 的 HTML 分析器/标记器?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我想在 Lucene 中从 html 中索引文本，实现此目的的最佳方法是什么?
在 Lucene 中是否有任何好的 Contrib 模块可以做到这一点?

I wanted to index text from html, in Lucene, what is the best way to achieve this ?
Is there any good Contrib module that can do this in Lucene ?

编辑
最后最终使用了 Jericho Parser.它不创建 DOM 并且易于使用.

EDIT
Finally ended up using Jericho Parser. It doesn't create DOM and is easy to use.