SQL Server 2012-在文件表顶部进行全文搜索-未搜索PDF [英] SQL Server 2012 - Fulltext search on top of a filetable - PDF not being searched

查看:136
本文介绍了SQL Server 2012-在文件表顶部进行全文搜索-未搜索PDF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用SQL Server 2012的FILETABLE功能处理大量Office和PDF文档,并在此基础上使用全文搜索,使我不知所措.

I'm getting my feet wet with handling a load of Office and PDF documents with SQL Server 2012's FILETABLE feature, and using fulltext search on top of that.

我已将SQL Server配置为支持全文本搜索和文件流,并创建了FILETABLE,将800多种各种文档转储到该文件夹​​中,并且一切正常.

I've configured my SQL Server to support fulltext search and filestream, and I've created a FILETABLE, dumped 800+ documents of all sorts into the folder, and that all works nicely.

为了能够全文索引MS Office文档,我已经安装了MS Filter Pack 2.0,并且要处理PDF文件,我已经下载了Adobe的PDF iFilter并全部安装了.

In order to be able to fulltext index MS Office documents, I've installed the MS Filter Pack 2.0, and to handle the PDF files, I've downloaded Adobe's iFilter for PDF and installed them all.

现在,我已经创建了全文目录:

Now I've created a full text catalog:

CREATE FULLTEXT CATALOG DocumentCatalog
WITH ACCENT_SENSITIVITY = OFF

,然后是FILETABLE表上的全文本索引:

and then a full text index on the FILETABLE table:

CREATE FULLTEXT INDEX 
ON dbo.Documents(name, file_type, file_stream)
KEY INDEX [PK_Document]
ON DocumentCatalog

一切似乎都很好.一段时间后,填充我拥有的800多个文档,我可以开始进行搜索了:

and that all seemed to work just fine. After a while, populating the 800+ documents I have, I can start doing searches:

SELECT 
    stream_id, name, file_type, cached_file_size, 
    file_stream.GetFileNamespacePath(1)
FROM 
    dbo.Documents
WHERE
    CONTAINS(*, 'Silverlight')

并很快找到了MS Office文档(*.doc, *.docx, *.ppt, *.pptx, *.xls, *.xlsx)中包含的内容.

and stuff that is contained in MS Office documents (*.doc, *.docx, *.ppt, *.pptx, *.xls, *.xlsx) is found quite nicely - and quickly.

不幸的是,似乎没有找到PDF文件中的所有文本:-(

Unfortunately, none of the text in the PDF files seems to be found :-(

任何想法为何?我在安装过程中没有任何错误,而且一切似乎都很好-我可以在SQL Server的Filters中看到.pdf文件类型:

Any ideas why? I had no errors during setup, and all seems fine - I can see the .pdf file type in the Filters in SQL Server:

SELECT *
FROM sys.fulltext_document_types

返回:

.pdf    E8978DA6-047F-4E3D-9C78-CDBE46041603    
        C:\Program Files\Adobe\Adobe PDF iFilter 11 for 64-bit platforms\bin\PDFFilter.dll    
        11.0.1.36    Adobe Systems, Inc.

但是以某种方式,那些PDF似乎没有被索引.我是否可以找出实际上已对哪些文件建立索引,以及在填充期间是否存在错误?我在哪里可以找到这些信息?

but somehow, those PDF don't seem to be indexed. Can I someone find out what files were in fact indexed, and whether or not there was an error during population? Where would I find this information?

推荐答案

我必须使用Adobe iFilter 9而不是11.

I had to use Adobe iFilter 9 not 11.

ftp://ftp.adobe.com /pub/adobe/acrobat/win/9.x/PDFiFilter64installer.zip

这篇关于SQL Server 2012-在文件表顶部进行全文搜索-未搜索PDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆