短语搜索使用弹性搜索显示不相关的结果 [英] Phrase Search using Elasticsearch Showing Unrelated Results

查看:151
本文介绍了短语搜索使用弹性搜索显示不相关的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Ubuntu Linux机器上使用Elasticsearch 1.4.1,使用Haystack 2.3.1来搜索Django 1.5站点。我使用EdgeNGram字段为文档文本设置了我的搜索索引,除了在SearchView中搜索searchqueryset的一些过滤之外,我有一个漂亮的标准设置(我想:))。



我有一个问题是短语搜索(引用的搜索)正常工作,除了某些特定情况,例如:1G鸡(刚刚组成,但是例证了这个问题) - 什么它似乎是忽略1G,只是把它变成一个搜索鸡。这是预期的吗有没有办法强制弹性搜索来表达这个短语?



这是从慢日志中查询本身:

  [2014-12-09 17:09:19,373] [WARN] [index.search.slowlog.fetch] [Advisor] [haystack] [4]采取了[3.3ms] ,took_millis [3],types [modelresult],stats [],search_type [QUERY_THEN_FETCH],total_shards [5],source [{query:{filtered:{filter:{terms:{django_ct :[ objectives.objective, actions.action, attachments.file, projects.project, toolkits.toolkit]}}, 查询:{ QUERY_STRING:{ auto_generate_phrase_queries: true,default_operator:AND,analyze_wildcard:true,query:(organization_id:(\2\OR \3\OR \6\OR \\ \\40 \OR \170 \OR \171 \OR \172 \OR \173 \OR \174 \)AND(\\ 1G Chicken\)),default_field:text}}}},from:0,size:15}],extra_source [],

organization_id与SQS过滤器相关在我最初提到的SearchView中。



另请注意我已经尝试过将手动设置为0的模糊,但这似乎没有帮助。 p>

任何想法?

解决方案

三分之一的min_gram只会存储令牌三个字符或更多 - 1G将被忽略。



您可以将min_gram减小到较小的长度或切换到另一个分析仪。


I'm using Elasticsearch 1.4.1 on an Ubuntu Linux machine to provide search for a Django 1.5 site, using Haystack 2.3.1. I have my search indexes set up using EdgeNGram fields for the document text, and other than some filtering of the searchqueryset in the SearchView, I have a pretty standard setup (I think :) ).

What I'm having an issue with is that phrase searches (quoted searches) are working fine, except for some certain cases, for example: "1G chicken" (just made up, but exemplifies the issue) - what it seems to do is ignore the 1G and simply turn it into a search for "chicken". Is this expected? Is there a way to force elastic search to honor the phrase?

Here is the query itself from the slow log:

[2014-12-09 17:09:19,373][WARN ][index.search.slowlog.fetch] [Advisor] [haystack][4] took[3.3ms], took_millis[3], types[modelresult], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"query":{"filtered":{"filter":{"terms":{"django_ct":["objectives.objective","actions.action","attachments.file","projects.project","toolkits.toolkit"]}},"query":{"query_string":{"auto_generate_phrase_queries":true,"default_operator":"AND","analyze_wildcard":true,"query":"(organization_id:(\"2\" OR \"3\" OR \"6\" OR \"40\" OR \"170\" OR \"171\" OR \"172\" OR \"173\" OR \"174\") AND (\"1G Chicken\"))","default_field":"text"}}}},"from":0,"size":15}], extra_source[],

The "organization_id" is related to the SQS filtering in the SearchView that I mentioned originally.

Also note I've tried things like manually setting the fuzziness to 0, but that doesn't seem to help.

Any ideas?

解决方案

A min_gram of three will only store tokens with three characters or more - "1G" will be ignored.

You can either decrease the min_gram to smaller length or switch to another analyser.

这篇关于短语搜索使用弹性搜索显示不相关的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆