在Apache Solr中搜索书籍 [英] Searching books in Apache Solr

查看:119
本文介绍了在Apache Solr中搜索书籍的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对Solr很新,我正在评估它。我的任务是在书籍库中查找单词,并在小范围内将它们返回 。到目前为止,我将书籍存储在按段落分割的数据库中(通过换行来切分书籍),我执行全文搜索并返回行。



在Solr,我是否也必须这样做,还是可以添加整本书(采用.txt格式),并且每当找到匹配项时,就会返回类似于匹配的内容加上前100个单词和后面100个单词之类的内容?谢谢

解决方案

突出显示会执行您的出价。 http://wiki.apache.org/solr/HighlightingParameters



以下是您的相关选项:

  hl.snippets 

突出显示的片段的最大数量,以生成每个字段.....

hl.fragsize

由荧光笔创建的片段(又名片段)的大小(以字符为单位) .....
默认值是100。

hl.mergeContiguous

将连续片段合并为一个片段....

对于你所描述的内容,将它设置为返回5(或者任何人可以正确处理的)来自 text 字段的片段,其中 hl.fl ;每个片段的长度在单词/短语周围400个字符(我近似为100个单词)。

另见 hl.regex.slop 用于在短语周围构建片段, hl.simple .pre / hl.simple.post 用于标记。


I'm very new to Solr and I'm evaluating it. My task is to look for words within a corpus of books and return them within a small context. So far, I'm storing the books in a database split by paragraphs (slicing the books by line breaks), I do a fulltext search and return the row.

In Solr, would I have to do the same, or can I add the whole book (in .txt format) and, whenever a match is found, return something like the match plus 100 words before and 100 words after or something like that? Thanks

解决方案

Highlighting will do your bidding. http://wiki.apache.org/solr/HighlightingParameters

Here are relevant options for you:

hl.snippets

The maximum number of highlighted snippets to generate per field.....

hl.fragsize

The size, in characters, of the snippets (aka fragments) created by the highlighter.....
The default value is "100". 

hl.mergeContiguous

Collapse contiguous fragments into a single fragment....

For what you describe, set it to return 5 (or whatever a human can sanely handle) snippets from text field with hl.fl; the length of each snippet 400 characters (my approximation of 100 words) around the word/phrase.

See also hl.regex.slop for building snippets around phrases and hl.simple.pre/hl.simple.post for markup.

这篇关于在Apache Solr中搜索书籍的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆