弹性搜索不同的过滤器值 [英] Elasticsearch distinct filter values

查看:123
本文介绍了弹性搜索不同的过滤器值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在弹性搜索中有一个大文件存储,并且想要检索显示在HTML下拉列表中的不同的过滤器值。



一个例子就是像/ p>

 [
{
name:John Doe,
deparments:[
{
name:Accounts
},
{
name:管理
}
]
},
{
name:Jane Smith,
deparments:[
{
name:IT
},
{
name:管理
}
]
}
]

下拉列表应该有一个部门列表,即IT,帐户和管理。



有些人请指出正确的方向,从弹性搜索中检索不同的部门清单?



感谢

解决方案

这是一个条款的工作 a href =http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation =nofollow >文档)。



您可以拥有独特的部门这样的值:

  POST公司/员工/ _search 
{
size:0,
aggs {
by_departments:{
terms:{
field:departments.name,
size:0 // see note 1
}
}
}
}

输出:

  {
...
聚合:{
by_departmen ts:{
buckets:[
{
key:管理,//参见注释2
doc_count:2
}
{
key:accounts,
doc_count:1
},
{
key:it b $ bdoc_count:1
}
]
}
}
}

另外两个注释:




  • 设置 size 为0将把最大值设置为Integer.MAX_VALUE。不要使用它,如果有太多的部门不同的值。

  • 你可以看到钥匙是分析部门的条款。确保在映射为 not_analyzed 的字段上使用术语聚合。



    • 例如,使用我们的默认映射( departments.name 是一个分析 string),添加此员工:

        {
      name:Bill Gates,
      department:[
      {
      name:IT
      },
      {
      name:人力资源
      }
      ]
      }

      将导致这种结果:

        {
      ...
      聚合:{
      by_departments:{
      buckets:[
      {
      key:it,
      doc_count:2
      },
      {
      key :管理,
      doc_count:2
      },
      {
      key:accounts,
      doc_count:1
      },
      {
      key:human,
      doc_count:1
      },
      {
      key:resource,
      doc_count:1
      }
      ]
      }
      }
      }

      使用正确的映射:

        POST公司
      {
      映射:{
      employee:{
      properties:{
      name:{
      type:string
      },
      department:{
      type:object,
      properties:{
      name:{
      type:string,
      index:not_analyzed
      }
      }
      }
      }
      }
      }
      }

      同样的请求最终输出:

        {
      ...
      aggregate:{
      by_departments:{
      buckets:[
      {
      key:IT,
      doc_count:2
      },
      {
      key:管理,
      doc_count:2
      },
      {
      key:Accounts,
      doc_count:1
      },
      {
      key:人力资源,
      doc_count:1
      }
      ]
      }
      }
      }

      希望这有帮助!


      I have a large document store in elasticsearch and would like to retrieve the distinct filter values for display on HTML drop-downs.

      An example would be something like

      [
          {
              "name": "John Doe",
              "deparments": [
                  {
                      "name": "Accounts"
                  },
                  {
                      "name": "Management"
                  }
              ]
          },
          {
              "name": "Jane Smith",
              "deparments": [
                  {
                      "name": "IT"
                  },
                  {
                      "name": "Management"
                  }
              ]
          }
      ]

      The drop-down should have a list of departments, i.e. IT, Account and Management.

      Would some kind person please point me in the right direction for retrieving a distinct list of departments from elasticsearch?

      Thanks

      解决方案

      This is a job for a terms aggregation (documentation).

      You can have the distinct departments values like this :

      POST company/employee/_search
      {
        "size":0,
        "aggs": {
          "by_departments": {
            "terms": {
              "field": "departments.name",
              "size": 0 //see note 1
            }
          }
        }
      }
      

      Which, in your example, outputs :

      {
         ...
         "aggregations": {
            "by_departments": {
               "buckets": [
                  {
                     "key": "management", //see note 2
                     "doc_count": 2
                  },
                  {
                     "key": "accounts",
                     "doc_count": 1
                  },
                  {
                     "key": "it",
                     "doc_count": 1
                  }
               ]
            }
         }
      }
      

      Two additional notes :

      • setting size to 0 will set the maximum buckets number to Integer.MAX_VALUE. Don't use it if there are too many departments distinct values.
      • you can see that the keys are terms resulting of analyzing departments values. Be sure to use your terms aggregation on a field mapped as not_analyzed .

      For example, with our default mapping (departments.name is an analyzed string), adding this employee:

      {
        "name": "Bill Gates",
        "departments": [
          {
            "name": "IT"
          },
          {
            "name": "Human Resource"
          }
        ]
      }
      

      will cause this kind of result:

      {
         ...
         "aggregations": {
            "by_departments": {
               "buckets": [
                  {
                     "key": "it",
                     "doc_count": 2
                  },
                  {
                     "key": "management",
                     "doc_count": 2
                  },
                  {
                     "key": "accounts",
                     "doc_count": 1
                  },
                  {
                     "key": "human",
                     "doc_count": 1
                  },
                  {
                     "key": "resource",
                     "doc_count": 1
                  }
               ]
            }
         }
      }
      

      With a correct mapping :

      POST company
      {
        "mappings": {
          "employee": {
            "properties": {
              "name": {
                "type": "string"
              },
              "departments": {
                "type": "object",
                "properties": {
                  "name": {
                    "type": "string",
                    "index": "not_analyzed"
                  }
                }
              }
            }
          }
        }
      }
      

      The same request ends up outputting :

      {
         ...
         "aggregations": {
            "by_departments": {
               "buckets": [
                  {
                     "key": "IT",
                     "doc_count": 2
                  },
                  {
                     "key": "Management",
                     "doc_count": 2
                  },
                  {
                     "key": "Accounts",
                     "doc_count": 1
                  },
                  {
                     "key": "Human Resource",
                     "doc_count": 1
                  }
               ]
            }
         }
      }
      

      Hope this helps!

      这篇关于弹性搜索不同的过滤器值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆