在 SQL Server 2005 中对 PDF 文件使用全文搜索 [英] Using full-text search with PDF files in SQL Server 2005

查看:24
本文介绍了在 SQL Server 2005 中对 PDF 文件使用全文搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 SQL Server 2005 中索引 PDF 文件时遇到一个奇怪的问题,希望有人能提供帮助.我的数据库有一个名为 MediaFile 的表,其中包含以下字段 - MediaFileId int identity pk、FileContent image 和 FileExtension varchar(5).我的 Web 应用程序可以毫无问题地将文件内容存储在此表中,并且能够毫无问题地对 doc、xls 等进行全文搜索 - 唯一不起作用的文件扩展名是 PDF.在此表格上执行全文搜索以查找我知道存在于表格中保存的 PDF 文件中的单词时,这些文件不会在搜索结果中返回.

I've got a strange problem with indexing PDF files in SQL Server 2005, and hope someone can help. My database has a table called MediaFile with the following fields - MediaFileId int identity pk, FileContent image, and FileExtension varchar(5). I've got my web application storing file contents in this table with no problems, and am able to use full-text searching on doc, xls, etc with no problems - the only file extension not working is PDF. When performing full-text searches on this table for words which I know exist inside of PDF files saved in the table, these files are not returned in the search results.

操作系统是 Windows Server 2003 SP2,我已经安装了 AdobeiFilter 6.0.按照此博客条目的说明,我执行了以下命令:

The OS is Windows Server 2003 SP2, and I've installed Adobe iFilter 6.0. Following the instructions on this blog entry, I executed the following commands:

exec sp_fulltext_service 'load_os_resources', 1;
exec sp_fulltext_service 'verify_signature', 0;

此后,我重新启动了 SQL Server,并通过执行以下命令验证 PDF 扩展的 iFilter 是否正确安装:

After this, I restarted the SQL Server, and verified that the iFilter for the PDF extensions is installed correctly by executing the following command:

select document_type, path from sys.fulltext_document_types where document_type = '.pdf' 

这将返回以下看起来正确的信息:

This returns the following information, which looks correct:

文档类型:.pdf
路径:C:Program FilesAdobePDF IFilter 6.0PDFFILT.dll

document_type: .pdf
path: C:Program FilesAdobePDF IFilter 6.0PDFFILT.dll

然后我(重新)在 MediaFile 表上创建索引,选择 FileContent 作为要索引的列,选择 FileExtension 作为其类型.向导创建索引并成功完成.为了测试,我正在执行这样的搜索:

Then I (re)created the index on the MediaFile table, selecting FileContent as the column to index and the FileExtension as its type. The wizard creates the index and completes successfully. To test, I'm performing a search like this:

SELECT MediaFileId, FileExtension FROM MediaFile WHERE CONTAINS(*, '"house"');

这将返回包含该术语的 DOC 文件,但不返回任何 PDF 文件,尽管我知道表中肯定有包含 house 一词的 PDF 文件.

This returns DOC files which contain this term, but not any PDF files, although I know that there are definitely PDF files in the table which contain the word house.

顺便说一句,我让这个工作了几分钟,上面的搜索返回了正确的 PDF 文件,但随后它又无缘无故地停止工作.

Incidentally, I got this working once for a few minutes, where the search above returned the correct PDF files, but then it just stopped working again for no apparent reason.

关于什么可能阻止 SQL Server 2005 索引 PDF 的任何想法,即使 Adob​​e iFilter 已安装并且似乎已加载?

Any ideas as to what could be stopping SQL Server 2005 from indexing PDF's, even though Adobe iFilter is installed and appears to be loaded?

推荐答案

谢谢 Ivan.通过从头开始一切,最终设法使这项工作顺利进行.事情的完成顺序似乎有很大的不同,链接博客上给出的在加载 iFilter 后关闭load_os_resources"设置的建议可能不是最好的选择,因为这会导致 iFilter重新启动 SQL Server 时不会加载.

Thanks Ivan. Managed to eventually get this working by starting everything from scratch. It seems like the order in which things are done makes a big difference, and the advice given on the linked blog to to turn off the 'load_os_resources' setting after loading the iFilter probably isn't the best option, as this will cause the iFilter to not be loaded when the SQL Server is restarted.

如果我没记错的话,最终对我有用的步骤顺序如下:

If I recall correctly, the sequence of steps that eventually worked for me was as follows:

  1. 确保该表没有索引(如果有,删除它)
  2. 安装 Adob​​e iFilter
  3. 执行命令 exec sp_fulltext_service 'load_os_resources', 1;
  4. 执行命令 exec sp_fulltext_service 'verify_signature', 0;
  5. 重新启动 SQL Server
  6. 验证 PDF iFilter 是否已安装
  7. 在表上创建全文索引
  8. 完全重新索引

虽然这确实奏效了,但我很确定我在它最终开始正常工作之前执行了几次这些步骤.

Although this did the trick, I'm quite sure I performed these steps a few times before it eventually started working properly.

这篇关于在 SQL Server 2005 中对 PDF 文件使用全文搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆