在 RavenDB 中的附件中搜索 [英] Search inside an attachment in RavenDB

查看:39
本文介绍了在 RavenDB 中的附件中搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开始学习 NoSQL,但我找不到一些适用于 RavenDB 的好例子.谁能告诉我如何在 RavenDB 中添加 Word、PDF、Excel、...二进制文档作为附件并搜索该文档的内容?有什么例子吗?那可能吗?我该如何为此制作 MVC 应用程序?

解决方案

首先要明白,当我们在 NoSQL 中谈论文档数据库"时,我们并不是在谈论 Word、PDF、Excel 文档.我们通常谈论的是 JSON 格式的文档,它表示一些特定的数据,通常是从域实体序列化的.绝大多数 RavenDB 都专注于处理此类数据.

但是,您确实可以处理您正在谈论的那种文档.它是通过一个附加的捆绑包"完成的,而不是内置的.它被称为索引附件捆绑包",是我写的.您可以在此处找到源代码.还有一些单元测试展示了如何使用它.例如,请参阅此测试.如果您有兴趣突出显示搜索结果,请参阅 这个测试也是.

该捆绑包使用 Windows IFilters 从二进制文档中提取文本.您将需要安装在本地系统上的适合您计划使用的文档类型的 IFilter.如果您打算对 PDF 文件进行大量处理,我强烈推荐福昕 PDF IFilter.它比 Adob​​e 的更好更快.如果您只处理 Word 和 Excel 文档,则可能需要 Microsoft 的 Office IFilter - 下载 x86x64,加上 Service Pack.>

安装适当的 IFilter 后,只需将附件上传到 RavenDB.该包将拦截上传,使用 IFilter 提取其内容,将内容保存到 JSON 文档,并为该文档编制索引以便于搜索.

您还可以从 Nuget 此处.该 dll 需要位于您的 RavenDB 服务器上的 plugins 目录中.

我目前没有使用此捆绑包的应用程序或网站的完整端到端示例.我也没有关于这个包的任何文档 - 所以一定要通读单元测试.

如果您只需要有关附件的一般信息,而不是索引或搜索它们,那么您应该阅读 RavenDB 文档.

I started to learn about NoSQL, but I can not find a some good examples for RavenDB. Can anybody tell me how to add Word, PDF, Excel, ... binary document as an attachment in RavenDB and search the content of that document? Is there any example for that? Is that possible? How can I make an MVC application for that?

解决方案

First, understand that when we talk about "document databases" in NoSQL, we aren't talking about Word, PDF, Excel documents. We are usually talking about a document in JSON format that represents some specific data, usually serialized from domain entities. The vast majority of RavenDB is focused on working with this sort of data.

However, you can indeed work with the sort of documents you are talking about. It's done with an add-on "bundle", not something that is built in. It's called the "Indexed Attachments Bundle", and I wrote it. You'll find the source code here. There are also unit tests that show how it can be used. For example, see this test. If you are interested in highlighting the search results, see this test also.

The bundle uses Windows IFilters to extract text from the binary document. You will need appropriate IFilters for the document types you plan to work with installed on your local system. If you plan to do a lot with PDF files, I highly recommend the Foxit PDF IFilter. It is much better and faster than Adobe's. If you are just working with Word and Excel documents, you may need the Office IFilters from Microsoft - Download either x86 or x64, plus the Service Pack.

With the appropriate IFilter installed, simply upload an attachment to RavenDB. The bundle will intercept the upload, extract its contents with the IFilter, save the contents to a JSON document, and index that document for easy searching.

You can also get a compiled version of the bundle from Nuget here. The dll needs to go in the plugins directory on your RavenDB server.

I do not currently have a full end-to-end sample of an application or website that uses this bundle. I also do not have any documentation on this bundle - so be sure to read through the unit tests.

If you just need information about attachments in general, not about indexing or searching them, then you should read the RavenDB documentation.

这篇关于在 RavenDB 中的附件中搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆