pdf文件中的Solr查询未返回突出显示的内容 [英] Solr query in a pdf file, is not returning highlighting content

查看：116 发布时间：2020/5/25 4:48:48 pdf curl text solr highlighting

本文介绍了pdf文件中的Solr查询未返回突出显示的内容的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我今天在我的debian服务器上实现了solr 6.5.1，但是我很难获取pdf文本内容.可以进行搜索，因为当我查询自己的名字"juan"时，文档可以很好地显示在文档中.但是，并没有与每个str结果一起出现.

I have implemented solr 6.5.1 today in my debian server but I have trouble getting the pdf text content. The searching is ok, because the document appears ok in when I query for example my name: "juan". However, the does not appear with each str result how it supposed to do.

这是示例查询:

这是结果:

<response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">1</int> <lst name="params"> <str name="hl.snippets">20</str> <str name="q">juan</str> <str name="hl">true</str> <str name="fl">title</str> <str name="hl.usePhraseHighlighter">true</str> <str name="hl.fl">content</str> <str name="wt">xml</str> </lst> </lst> <result name="response" numFound="1" start="0"> <doc> <arr name="title"> <str>CV_Juan_Jara_ultimo</str> </arr> </doc> </result> <lst name="highlighting"> <lst name="/solr-6.5.1/mydocs/CV_Juan_Jara_ultimo.pdf"/> </lst> </response>

此外，该日志显示了所有pdf文本，因此我认为它已正确索引(我使用以下命令对pdf进行了索引: bin/post -c ex mydocs/CV_Juan_Jara_ultimo.pdf ).

Additionally, the log is showing all the pdf text, so I assume it was correctly indexed (I indexed the pdf using the command: bin/post -c ex mydocs/CV_Juan_Jara_ultimo.pdf).

我使用curl将内容"字段添加到架构中:

I added the "content" field to the schema, using curl:

curl -X POST -H 'Content-type:application/json' --data-binary '{ "add-field" : { "name":"text", "type":"text_general", "indexed":"true", "stored":"false", "multiValued":"true" } }' localhost:8983/solr/ex/schema

你知道怎么了吗?

我要做的就是在pdf中搜索一个主题，然后像这样突出显示所有结果:

All that I want to do is search a topic in my pdf and then get all results highlighted like this:

推荐答案
已解决:最终对我有用的解决方案是用以下curl命令替换架构中的_text_字段:

SOLVED: the solution that worked for me finally, was to replace the _text_ field in schema with this curl command:

curl -X POST -H 'Content-type:application/json' --data-binary '{ "replace-field" : { "name":"_text_", "type":"text_general", "indexed":"true", "stored":"true", "multiValued":"true" } }' http://localhost:8983/solr/ex/schema

这是因为_text_字段默认情况下带有"stored":"false".

This is because the _text_ field comes with "stored":"false" by default.

注意:请记住，如果在替换此架构字段之前已将所有文件重新索引到您的核心，则

NOTE: Remember to indexing all files again to your core if you did it prior to this schema field replace

这篇关于pdf文件中的Solr查询未返回突出显示的内容的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pdf文件中的Solr查询未返回突出显示的内容 [英] Solr query in a pdf file, is not returning highlighting content

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

pdf文件中的Solr查询未返回突出显示的内容 [英] Solr query in a pdf file, is not returning highlighting content

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭