使用 ElasticSearch 和/或 Solr 作为 MS Office 和 PDF 文档的数据存储 [英] Using ElasticSearch and/or Solr as a datastore for MS Office and PDF documents

查看：18 发布时间：2021/12/13 12:31:17 pdf solr elasticsearch ms-office

本文介绍了使用 ElasticSearch 和/或 Solr 作为 MS Office 和 PDF 文档的数据存储的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前正在设计一个全文搜索系统，用户可以在其中对 MS Office 和 PDF 文档执行文本查询，结果将返回与查询最匹配的文档列表.然后，用户将选择返回的任何文档并在 MS Word、Excel 或 PDF 查看器中查看该文档.

I'm currently designing a full text search system where users perform text queries against MS Office and PDF documents, and the result will return a list of documents that best match the query. The user will then be to select any document returned and view that document within MS Word, Excel, or a PDF viewer.

我可以使用 ElasticSearch 或 Solr 将原始二进制文档(即 .docx、.xlsx、.pdf 文件)导入其数据存储"，然后根据命令将文档导出到用户的设备以供查看.

Can I use ElasticSearch or Solr to import the raw binary documents (ie. .docx, .xlsx, .pdf files) into its "data store", and then export the document to the user's device on command for viewing.

以前，我使用 MongoDB 2.6.6 将原始文件导入 GridFS，并将提取的文本导入一个单独的集合(该集合包含一个文本索引)并且工作正常.但是，MongoDB 全文搜索非常基础，因此我现在正在考虑使用 Solr 或 ElasticSearch 来执行更复杂的文本搜索.

Previously, I used MongoDB 2.6.6 to import the raw files into GridFS and the extracted text into a separate collection (the collection contained a text index) and that worked fine. However, MongoDB full text searching is quite basic and therefore I'm now looking at either Solr or ElasticSearch to perform more complex text searching.

尼克

使用 ElasticSearch 和/或 Solr 作为 MS Office 和 PDF 文档的数据存储 [英] Using ElasticSearch and/or Solr as a datastore for MS Office and PDF documents

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 ElasticSearch 和/或 Solr 作为 MS Office 和 PDF 文档的数据存储 [英] Using ElasticSearch and/or Solr as a datastore for MS Office and PDF documents

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭