获取PDF中searchresult的页码 [英] Get page numbers of searchresult of a pdf in solr
问题描述
我正在构建一个Web应用程序,用户可以搜索pdf文档并使用pdf.js查看它们。我想用一段简短的段落显示搜索结果,其中找到的搜索字词以及在右侧页面打开文档的链接。
I'm building a web application where users can search for pdf documents and view them with pdf.js. I would like to display the search results with a short snippet of the paragraph where the search term where found and a link to open the document at the right page.
我需要的是每个搜索结果的页码和简短的文本片段。
So what I need is the page number and a short text snippet of every search result.
我使用SOLR 4.1来索引PDF文档。索引本身工作正常,但我不知道如何获得搜索结果的页码和段落。
I'm using SOLR 4.1 to index pdf documents. The indexing itself works fine but I don't know how to get the page number and paragraph of a search result.
我在这里发现了这个使用Solr为索引PDF索引页面编号但它并不真正有帮助。
I found this here "Indexing PDF with page numbers with Solr" but it wasn't really helpfully.
推荐答案
我现在分割PDF并分别将每个页面发送到SOLR。
因此,每个页面都是一个ID < id_of_document> _< page_number>
的自己的文档,以及一个只包含 < id_of_document>
用于分组结果。
I'm now splitting the PDF and sending each page separately to SOLR.
So every page is an own document with an id <id_of_document>_<page_number>
and an additional field doc_id which contains only the <id_of_document>
for grouping the results.
这篇关于获取PDF中searchresult的页码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!