在SQL Server 2005中使用PDF文件进行全文搜索 [英] Using full-text search with PDF files in SQL Server 2005

查看:131
本文介绍了在SQL Server 2005中使用PDF文件进行全文搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在索引SQL Server 2005中的PDF文件时遇到了一个奇怪的问题,并希望有人能提供帮助。我的数据库有一个名为MediaFile的表,其中包含以下字段 - MediaFileId int identity pk,FileContent image和FileExtension varchar(5)。我有我的Web应用程序存储在这个表中的文件内容没有问题,并且能够使用doc,xls等全文搜索没有问题 - 唯一的文件扩展名不工作是PDF。当我在表中对保存在表格中的PDF文件存在的词进行全文搜索时,这些文件不会返回到搜索结果中。



OS是Windows Server 2003 SP2,并且我已经安装了 Adob​​e iFilter 6.0 。按照此博客条目的说明,我执行了以下命令:

  exec sp_fulltext_service'load_os_resources',1; 
exec sp_fulltext_service'verify_signature',0;

在此之后,我重新启动了SQL Server,并验证了用于PDF扩展的iFilter已正确安装通过执行以下命令:

  select document_type,path from sys.fulltext_document_types where document_type ='.pdf'

这会返回以下信息,看起来正确:


document_type:.pdf

路径:C:\ Program Files \Adobe\PDF IFilter 6.0\PDFFILT.dll




然后,我(重新)在MediaFile表上创建索引,选择FileContent作为索引列,并将FileExtension作为其类型。向导创建索引并成功完成。为了测试,我正在执行这样的搜索:

  SELECT MediaFileId,FileExtension FROM MediaFile Where CONTAINS(*,'house '); 

这将返回包含此术语的DOC文件,但不包含任何PDF文件,但我知道存在绝对是包含单词 house 的PDF文件。



顺便提一句,我在这里工作了一段时间,返回正确的PDF文件,但之后它再次停止工作,没有明显的原因。



任何关于什么可以阻止SQL Server 2005索引PDF的想法,尽管Adobe iFilter已安装并显示为已加载?

解决方案

感谢Ivan。通过从零开始开始一切工作,最终完成这项工作。看起来事情完成的顺序有很大的不同,并且在链接的博客上给出的关于加载iFilter后关闭load_os_resources设置的建议可能不是最好的选择,因为这会导致iFilter在重新启动SQL Server时不会被加载。



如果我没有记错,最终对我工作的步骤顺序如下:


  1. 确保表中没有索引(如果是,请删除它) 安装Adobe iFilter
  2. li>
  3. 执行命令exec sp_fulltext_service'load_os_resources',1;

  4. 执行命令exec sp_fulltext_service'verify_signature',0;

  5. 重新启动SQL Server

  6. 验证PDF iFilter已安装

  7. 在表上创建全文索引

  8. 完全重新编制索引

尽管这样做有把戏,但我确信我已经执行了几次这些步骤矿石最终开始正常工作。

I've got a strange problem with indexing PDF files in SQL Server 2005, and hope someone can help. My database has a table called MediaFile with the following fields - MediaFileId int identity pk, FileContent image, and FileExtension varchar(5). I've got my web application storing file contents in this table with no problems, and am able to use full-text searching on doc, xls, etc with no problems - the only file extension not working is PDF. When performing full-text searches on this table for words which I know exist inside of PDF files saved in the table, these files are not returned in the search results.

The OS is Windows Server 2003 SP2, and I've installed Adobe iFilter 6.0. Following the instructions on this blog entry, I executed the following commands:

exec sp_fulltext_service 'load_os_resources', 1;
exec sp_fulltext_service 'verify_signature', 0;

After this, I restarted the SQL Server, and verified that the iFilter for the PDF extensions is installed correctly by executing the following command:

select document_type, path from sys.fulltext_document_types where document_type = '.pdf' 

This returns the following information, which looks correct:

document_type: .pdf
path: C:\Program Files\Adobe\PDF IFilter 6.0\PDFFILT.dll

Then I (re)created the index on the MediaFile table, selecting FileContent as the column to index and the FileExtension as its type. The wizard creates the index and completes successfully. To test, I'm performing a search like this:

SELECT MediaFileId, FileExtension FROM MediaFile WHERE CONTAINS(*, '"house"');

This returns DOC files which contain this term, but not any PDF files, although I know that there are definitely PDF files in the table which contain the word house.

Incidentally, I got this working once for a few minutes, where the search above returned the correct PDF files, but then it just stopped working again for no apparent reason.

Any ideas as to what could be stopping SQL Server 2005 from indexing PDF's, even though Adobe iFilter is installed and appears to be loaded?

解决方案

Thanks Ivan. Managed to eventually get this working by starting everything from scratch. It seems like the order in which things are done makes a big difference, and the advice given on the linked blog to to turn off the 'load_os_resources' setting after loading the iFilter probably isn't the best option, as this will cause the iFilter to not be loaded when the SQL Server is restarted.

If I recall correctly, the sequence of steps that eventually worked for me was as follows:

  1. Ensure that the table does not have an index already (and if so, delete it)
  2. Install Adobe iFilter
  3. Execute the command exec sp_fulltext_service 'load_os_resources', 1;
  4. Execute the command exec sp_fulltext_service 'verify_signature', 0;
  5. Restart SQL Server
  6. Verify PDF iFilter is installed
  7. Create full-text index on table
  8. Do full re-index

Although this did the trick, I'm quite sure I performed these steps a few times before it eventually started working properly.

这篇关于在SQL Server 2005中使用PDF文件进行全文搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆