pdf文件中的Solr查询未返回突出显示的内容 [英] Solr query in a pdf file, is not returning highlighting content
问题描述
我今天在我的debian服务器上实现了solr 6.5.1,但是我很难获取pdf文本内容.可以进行搜索,因为当我查询自己的名字"juan"时,文档可以很好地显示在文档中.但是,并没有与每个str结果一起出现.
I have implemented solr 6.5.1 today in my debian server but I have trouble getting the pdf text content. The searching is ok, because the document appears ok in when I query for example my name: "juan". However, the does not appear with each str result how it supposed to do.
这是示例查询:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
<lst name="params">
<str name="hl.snippets">20</str>
<str name="q">juan</str>
<str name="hl">true</str>
<str name="fl">title</str>
<str name="hl.usePhraseHighlighter">true</str>
<str name="hl.fl">content</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<arr name="title">
<str>CV_Juan_Jara_ultimo</str>
</arr>
</doc>
</result>
<lst name="highlighting">
<lst name="/solr-6.5.1/mydocs/CV_Juan_Jara_ultimo.pdf"/>
</lst>
</response>
此外,该日志显示了所有pdf文本,因此我认为它已正确索引(我使用以下命令对pdf进行了索引: bin/post -c ex mydocs/CV_Juan_Jara_ultimo.pdf ).
Additionally, the log is showing all the pdf text, so I assume it was correctly indexed (I indexed the pdf using the command: bin/post -c ex mydocs/CV_Juan_Jara_ultimo.pdf).
我使用curl将内容"字段添加到架构中:
I added the "content" field to the schema, using curl:
curl -X POST -H 'Content-type:application/json' --data-binary '{
"add-field" : {
"name":"text",
"type":"text_general",
"indexed":"true",
"stored":"false",
"multiValued":"true"
}
}' localhost:8983/solr/ex/schema
你知道怎么了吗?
我要做的就是在pdf中搜索一个主题,然后像这样突出显示所有结果:
All that I want to do is search a topic in my pdf and then get all results highlighted like this: