搜索和排序弹性搜索 [英] Searching and Sorting on ElasticSearch

查看:75
本文介绍了搜索和排序弹性搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



主要的想法是拥有一套我想要的帖子搜索一些文本字段,并按相关性,然后按创建日期(其中一个字段)排序结果。我正在使用具有默认es库的节点js。



这是我的映射:

  {
version:1,
conf:{
settings:{
analysis:{
:{
snowball:{
type:snowball,
language:English
},
english_stemmer
type:stemmer,
language:english
},
english_possessive_stemmer:{
type
language:possive_english
},
stopwords:{
type:stop,
stopwords:[_english_ ]
},
worddelimiter:{
type:word_delimiter
}
},
tokenizer:{
nGram:{
type:nGram,
min_gram:3,
max_gram:20
}
},
analyzer:{
custom_analyzer:{
type:custom,
tokenizer:nGram,
过滤器:[
停止词,
asciifolding,
小写,
snowball,
english_stemmer,
english_possessive_stemmer,
worddelimiter
]
},
custom_search_analyzer:{
type custom,
tokenizer:standard,
filter:[
stopwords,
asciifolding,
smallcase,
snowball,
english_stemmer,
english_possessive_stemmer,
worddelimiter
]
}
}
}
},
映射 :{
posts:{
model:Post,
properties:{
id:{
type长
},
title:{
type:string,
analyzer:custom_analyzer,
boost
},
description:{
type:string,
analyzer:custom_analyzer,
boost:4

类别:{
type:string,
analyzer:custom_analyzer
},
seller {
type:object,
properties:{
id:{
type:long
},
username:{
type:string,
analyzer:custom_analyzer,
boost:1
},
firstName:{
type:string,
analyzer:custom_analyzer,
boost:3
},
lastName:{
类型:string,
analyzer:custom_analyzer,
boost:2
}
}
},
marketPrice :{
type:float
},
currentPrice:{
type:float
},
折扣:{
type:float
},
commentsCount:{
type:integer,
index :not_analyzed
},
likesCount:{
type:integer,
index:not_analyzed
},
created:{
type:date,
index:not_analyzed
},
modifi ed:{
type:date,
index:not_analyzed
}
}
}
}
}
}

我索引了10个文件:

  | id |标题|描述| market_price | item_condition | iso | comment_count |创建| 
| 1 |帖子1 |帖子1说明| 1 | 1 | 1 | 1 | 2014/01/01 |
| 2 |帖子2 |帖子2描述| 1 | 1 | 1 | 1 | 2014/01/02 |
| 3 |帖子3 |帖子3描述| 1 | 1 | 1 | 1 | 2014/01/03 |
| 4 |帖子4 |帖子4描述| 1 | 1 | 1 | 1 | 2014/01/04 |
| 5 |帖子5 |帖子5描述| 1 | 1 | 1 | 1 | 2014/01/05 |
| 6 |帖子6 |帖子6描述| 1 | 1 | 1 | 1 | 2014/01/06 |
| 7 |帖子7 |帖子7说明| 1 | 1 | 1 | 1 | 2014/01/07 |
| 8 |帖子8 |帖子8描述| 1 | 1 | 1 | 1 | 2014/01/08 |
| 9 |帖子9 |帖子9描述| 1 | 1 | 1 | 1 | 2014/01/09 |
| 10 |帖子10 |帖子10描述| 1 | 1 | 1 | 1 | 2014/01/010 |

假设卖家信息有两个,我不会在这里添加,因为该帖子将我的查询是:

  GET / clamour_develop / _search 
{
查询:{
multi_match:{
query:post 1,
fields:[title描述,seller.first_name,seller.last_name,seller.username],
analyzer:custom_search_analyzer
}
},
:[
{
_score:{
order:desc
}
},{
created:{
order:desc
}
}
]
}

我希望以订单收到文件

 发布1 
发布10
职位9
职位8
职位7
职位6
职位5
职位4
职位3
职位2

但我得到

 发布1 
发布10
帖子8
帖子3
帖子9
帖子7
帖子6
帖子4
帖子2
帖子5



编辑:



https://gist.github.com/bitgandtter/5d3419840fd0508ce356



我是什么做错了?

解决方案

在我仔细阅读ES文档后,我发现我可以解决启用dfs_query_then_fetch查询模式的问题。我知道这不是一个很好的做法,但是对于短短的数据,这可能是有帮助的。



有些策略可以使最新项目的这种模式,以及ES数据库发生变化时默认模式,并继续工作。


I am implementing ES as my SE and I have some questions about what am I doing wrong.

The main idea is to have a set of posts that I want to search over some text fields and sort the results by relevance and then by creation date (one of the fields). I'm using node js with the default es library.

Here is my mapping:

{
  "version": 1,
  "conf": {
    "settings": {
      "analysis": {
        "filter": {
          "snowball": {
            "type": "snowball",
            "language": "English"
          },
          "english_stemmer": {
            "type": "stemmer",
            "language": "english"
          },
          "english_possessive_stemmer": {
            "type": "stemmer",
            "language": "possessive_english"
          },
          "stopwords": {
            "type": "stop",
            "stopwords": ["_english_"]
          },
          "worddelimiter": {
            "type": "word_delimiter"
          }
        },
        "tokenizer": {
          "nGram": {
            "type": "nGram",
            "min_gram": 3,
            "max_gram": 20
          }
        },
        "analyzer": {
          "custom_analyzer": {
            "type": "custom",
            "tokenizer": "nGram",
            "filter": [
              "stopwords",
              "asciifolding",
              "lowercase",
              "snowball",
              "english_stemmer",
              "english_possessive_stemmer",
              "worddelimiter"
            ]
          },
          "custom_search_analyzer": {
            "type": "custom",
            "tokenizer": "standard",
            "filter": [
              "stopwords",
              "asciifolding",
              "lowercase",
              "snowball",
              "english_stemmer",
              "english_possessive_stemmer",
              "worddelimiter"
            ]
          }
        }
      }
    },
    "mappings": {
      "posts": {
        "model": "Post",
        "properties": {
          "id": {
            "type": "long"
          },
          "title": {
            "type": "string",
            "analyzer": "custom_analyzer",
            "boost": 5
          },
          "description": {
            "type": "string",
            "analyzer": "custom_analyzer",
            "boost": 4
          },
          "categories": {
            "type": "string",
            "analyzer": "custom_analyzer"
          },
          "seller": {
            "type": "object",
            "properties": {
              "id": {
                "type": "long"
              },
              "username": {
                "type": "string",
                "analyzer": "custom_analyzer",
                "boost": 1
              },
              "firstName": {
                "type": "string",
                "analyzer": "custom_analyzer",
                "boost": 3
              },
              "lastName": {
                "type": "string",
                "analyzer": "custom_analyzer",
                "boost": 2
              }
            }
          },
          "marketPrice": {
            "type": "float"
          },
          "currentPrice": {
            "type": "float"
          },
          "discount": {
            "type": "float"
          },
          "commentsCount": {
            "type": "integer",
            "index": "not_analyzed"
          },
          "likesCount": {
            "type": "integer",
            "index": "not_analyzed"
          },
          "created": {
            "type": "date",
            "index": "not_analyzed"
          },
          "modified": {
            "type": "date",
            "index": "not_analyzed"
          }
        }
      }
    }
  }
}

I have indexed 10 documents:

  | id | title   | description         | market_price | item_condition | iso | comment_count | created     |
  | 1  | Post 1  | Post 1 Description  | 1            | 1              | 1   | 1             | 2014/01/01  |
  | 2  | Post 2  | Post 2 Description  | 1            | 1              | 1   | 1             | 2014/01/02  |
  | 3  | Post 3  | Post 3 Description  | 1            | 1              | 1   | 1             | 2014/01/03  |
  | 4  | Post 4  | Post 4 Description  | 1            | 1              | 1   | 1             | 2014/01/04  |
  | 5  | Post 5  | Post 5 Description  | 1            | 1              | 1   | 1             | 2014/01/05  |
  | 6  | Post 6  | Post 6 Description  | 1            | 1              | 1   | 1             | 2014/01/06  |
  | 7  | Post 7  | Post 7 Description  | 1            | 1              | 1   | 1             | 2014/01/07  |
  | 8  | Post 8  | Post 8 Description  | 1            | 1              | 1   | 1             | 2014/01/08  |
  | 9  | Post 9  | Post 9 Description  | 1            | 1              | 1   | 1             | 2014/01/09  |
  | 10 | Post 10 | Post 10 Description | 1            | 1              | 1   | 1             | 2014/01/010 |

Assume that the seller info is there two, I don't add it here because the post will be extensive.

My query is:

GET /clamour_develop/_search
{
     "query": {
         "multi_match": {
         "query":    "post 1",
         "fields":   [ "title", "description", "seller.first_name", "seller.last_name",     "seller.username" ],
         "analyzer": "custom_search_analyzer"
         }
     },
     "sort": [
       {
         "_score":{
           "order": "desc"
         }
       },{
         "created": {
           "order": "desc"
         }
       }
     ]
 }

I expect to receive the documents in the order

 Post 1
 Post 10
 Post 9
 Post 8
 Post 7
 Post 6
 Post 5
 Post 4
 Post 3
 Post 2

But I get

 Post 1
 Post 10
 Post 8
 Post 3
 Post 9
 Post 7
 Post 6
 Post 4
 Post 2
 Post 5

EDIT:

https://gist.github.com/bitgandtter/5d3419840fd0508ce356

What am I doing wrong?

解决方案

After i read more carefully ES doc i found that i can solve the issue enabling the dfs_query_then_fetch query mode. I know its not a good practice but for shorts amount of data it can be helpful.

Some strategy can be enable this mode for newest projects and when ES database grow change over the default mode and keep working on it.

这篇关于搜索和排序弹性搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆