Solr 能否保留在其结果中提供给它的 HTML 文档的格式? [英] Can Solr retain the formatting of the HTML documents whcih was fed to it in its result?

查看：25 发布时间：2021/11/14 23:48:36 solr solrj apache-tika solr-cell

本文介绍了Solr 能否保留在其结果中提供给它的 HTML 文档的格式?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何维护 HTML 文档的原始格式Solr 给出的结果?

How do I maintain the Original formatting of the HTML document in the results given by Solr?

我正在尝试在我公司的一个网站中提供搜索功能，该网站拥有数百万个文档，并且所有文档的格式都不相似，因此很难单独设置每个文档的格式.

I am trying to provide search functionality in one of my companies website that is having millions of documents and all are not having similar formatting, So it is hard to individually format each document.

我在 apache 站点使用 Solr 4.1 nightly builds，该站点内置了对 solr 的支持-细胞和蒂卡.即我不需要单独配置它们.

I am using Solr 4.1 nightly builds at apache site which is having inbuilt support for solr-cell and tika. i.e. i need not to separately configure them.

solr-cell 或 tika 是否在任何地方保留这些格式?

does solr-cell or tika retains these formatting anywhere?

如果它不保留格式，那么我需要使用 solr 的 resourcename 字段从物理文件位置获取每个文档并应用高亮显示和其他 solr 现成的功能，但这个过程是太累了.

If it does not retain the formatting then I'll need to fetch each document from physical file location using resourcename field of solr and apply the highlights and other solr ready made functionality, But this process is too tedious.

如果我必须使用 Jayendra 在答案中建议的HTMLStripCharFilterFactory"，我可以使用什么作为请求处理程序?在这种情况下，我也可以提取元数据标签吗?

What can i use as a Request Handler if i have to use "HTMLStripCharFilterFactory" as suggested by Jayendra in the answer? also can i extract metadata tags in that case?

谁能指导我这件事！

感谢大家的支持.！！！

Thank you for all your support.!!!

Solr 能否保留在其结果中提供给它的 HTML 文档的格式? [英] Can Solr retain the formatting of the HTML documents whcih was fed to it in its result?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Solr 能否保留在其结果中提供给它的 HTML 文档的格式? [英] Can Solr retain the formatting of the HTML documents whcih was fed to it in its result?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭