ElasticSearch-返回查询的构面的完整值 [英] ElasticSearch - return the complete value of a facet for a query

查看:117
本文介绍了ElasticSearch-返回查询的构面的完整值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近开始使用ElasticSearch.我尝试完成一些用例.我对其中一个有问题.

I've recently started using ElasticSearch. I try to complete some use cases. I have a problem for one of them.

我已经用一些用户的全名为索引(例如"Jean-Paul Gautier","Jean De La Fontaine").

I have indexed some users with their full name (e.g. "Jean-Paul Gautier", "Jean De La Fontaine").

我尝试让所有全名响应某个查询.

I try to get all the full names responding to some query.

例如,我希望以"J"开头的100个最常用的全名

For example, I want the 100 most frequent full names beggining by "J"

{
  "query": {
    "query_string" : { "query": "full_name:J*" } }
  },
  "facets":{
    "name":{
      "terms":{
        "field": "full_name",
        "size":100
      }
    }
  }
}

我得到的结果是全名的所有单词:让",保罗",高铁",德",拉",方丹".

The result I get is all the words of the full names : "Jean", "Paul", "Gautier", "De", "La", "Fontaine".

如何获取"Jean-Paul Gautier"和"Jean De La Fontaine"(所有全名值都由'J'乞求)? "post_filter"选项不执行此操作,它仅限制此子集.

How to get "Jean-Paul Gautier" and "Jean De La Fontaine" (all the full_name values begging by 'J') ? The "post_filter" option is not doing this, it only restrict this above subset.

  • 我必须配置全称"方面的工作方式"
  • 我必须为此当前查询添加一些选项
  • 我必须做一些映射"(暂时还不清楚)

谢谢

推荐答案

您只需要在字段上设置"index": "not_analyzed",就可以取回整面中未修改的完整字段值.

You just need to set "index": "not_analyzed" on the field, and you will be able to get back the full, unmodified field values in your facet.

通常情况下,最好有一个未分析的字段版本(用于分面),而另一个未分析的版本(用于搜索). "multi_field"字段类型对此很有用.

Typically, it's nice to have one version of the field that isn't analyzed (for faceting) and another that is (for searching). The "multi_field" field type is useful for this.

因此,在这种情况下,我可以定义一个映射,如下所示:

So in this case, I can define a mapping as follows:

curl -XPUT "http://localhost:9200/test_index/" -d'
{
   "mappings": {
      "people": {
         "properties": {
            "full_name": {
               "type": "multi_field",
               "fields": {
                  "untouched": {
                     "type": "string",
                     "index": "not_analyzed"
                  },
                  "full_name": {
                     "type": "string"
                  }
               }
            }
         }
      }
   }
}'

在这里,我们有两个子字段.默认名称与父名称相同.因此,如果您针对"full_name"字段进行搜索,Elasticsearch实际上将使用"full_name.full_name". "full_name.untouched"将为您提供您想要的方面结果.

Here we have two sub-fields. The one with the same name as the parent will be the default, so if you search against the "full_name" field, Elasticsearch will actually use "full_name.full_name". "full_name.untouched" will give you the facet results you want.

所以接下来我添加两个文档:

So next I add two documents:

curl -XPUT "http://localhost:9200/test_index/people/1" -d'
{
   "full_name": "Jean-Paul Gautier"
}'

curl -XPUT "http://localhost:9200/test_index/people/2" -d'
{
   "full_name": "Jean De La Fontaine"
}'

然后我可以在每个字段上查看返回的内容:

And then I can facet on each field to see what is returned:

curl -XPOST "http://localhost:9200/test_index/_search" -d'
{
   "size": 0,
   "facets": {
      "name_terms": {
         "terms": {
            "field": "full_name"
         }
      },
      "name_untouched": {
         "terms": {
            "field": "full_name.untouched",
            "size": 100
         }
      }
   }
}'

然后我得到以下信息:

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0,
      "hits": []
   },
   "facets": {
      "name_terms": {
         "_type": "terms",
         "missing": 0,
         "total": 7,
         "other": 0,
         "terms": [
            {
               "term": "jean",
               "count": 2
            },
            {
               "term": "paul",
               "count": 1
            },
            {
               "term": "la",
               "count": 1
            },
            {
               "term": "gautier",
               "count": 1
            },
            {
               "term": "fontaine",
               "count": 1
            },
            {
               "term": "de",
               "count": 1
            }
         ]
      },
      "name_untouched": {
         "_type": "terms",
         "missing": 0,
         "total": 2,
         "other": 0,
         "terms": [
            {
               "term": "Jean-Paul Gautier",
               "count": 1
            },
            {
               "term": "Jean De La Fontaine",
               "count": 1
            }
         ]
      }
   }
}

如您所见,分析字段返回单个单词的小写标记(当您不指定分析器时,

As you can see, the analyzed field returns single-word, lower-cased tokens (when you don't specify an analyzer, the standard analyzer is used), and the un-analyzed sub-field returns the unmodified original text.

这是一个可运行的示例,您可以使用: http://sense.qbox.io/gist/7abc063e2611846011dd874648fd1b77450b19a5

Here is a runnable example you can play with: http://sense.qbox.io/gist/7abc063e2611846011dd874648fd1b77450b19a5

这篇关于ElasticSearch-返回查询的构面的完整值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆