如何在elastic.js(elasticsearch)中计算具有相同值的字段? [英] How to count of fields with the same value in elastic.js (elasticsearch)?

查看:110
本文介绍了如何在elastic.js(elasticsearch)中计算具有相同值的字段?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个社区列表.而且我需要创建一个聚合查询,该查询将对所有具有相同标题的数据进行计数.

I have a list of communities. And I need to create aggregation query which will count all data which have the same title.

[
  {
    "_id": "56161cb3cbdad2e3b437fdc3",
    "_type": "Comunity",
    "name": "public",
    "data": [
      {
        "title": "sonder",
        "creationDate": "2015-08-22T03:43:28 -03:00",
        "quantity": 0
      },
      {
        "title": "vule",
        "creationDate": "2014-05-17T12:35:01 -03:00",
        "quantity": 0
      },
      {
        "title": "omer",
        "creationDate": "2015-01-31T04:54:19 -02:00",
        "quantity": 3
      },
      {
        "title": "sonder",
        "creationDate": "2014-05-22T05:09:36 -03:00",
        "quantity": 3
      }
    ]
  },
  {
    "_id": "56161cb3dae30517fc133cd9",
    "_type": "Comunity",
    "name": "static",
    "data": [
      {
        "title": "vule",
        "creationDate": "2014-07-01T06:32:06 -03:00",
        "quantity": 5
      },
      {
        "title": "vule",
        "creationDate": "2014-01-10T12:40:28 -02:00",
        "quantity": 1
      },
      {
        "title": "vule",
        "creationDate": "2014-01-09T09:33:11 -02:00",
        "quantity": 3
      }
    ]
  },
  {
    "_id": "56161cb32f62b522355ca3c8",
    "_type": "Comunity",
    "name": "public",
    "data": [
      {
        "title": "vule",
        "creationDate": "2014-02-03T09:55:28 -02:00",
        "quantity": 2
      },
      {
        "title": "vule",
        "creationDate": "2015-01-23T09:14:22 -02:00",
        "quantity": 0
      }
    ]
  }
]

所以欲望的结果应该是

[
  {
    title: vule,
    total: 6
  },
  {
    title: omer,
    total: 1
  },
  {
    title: sonder,
    total: 1
  }
]

我写了一些聚合查询,但仍然无法正常工作.如何获得欲望结果?

I wrote some aggregation queries but it still not work. How can I get desire result?

PS:我试图创建嵌套聚合

PS: I tried to create nested aggregation

ejs.Request().size(0).agg(
        ejs.NestedAggregation('comunities')
            .path('data')
            .agg(
                ejs.FilterAggregation('sonder')
                    .filter(
                    ejs.TermsFilter('data.title', 'sonder')
                ).agg(
                ejs.ValueCountAggregation('counts')
                      .field('data.title')
)
            )
    );

推荐答案

您需要使用现在,根据您的映射,可能有两种方式可以做到这一点:

Now depending on your mapping there could be two ways of doing that:

1.您的数据字段存储为子文档

您需要运行一个简单的术语汇总,在RAW json中如下所示:

You need to run a simple terms aggregation, which in RAW json looks like:

POST /test/test/_search
{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggs": {
    "Grouping": {
      "terms": {
        "field": "data.title",
        "size": 0
      }
    }
  }
}

2.您的数据字段存储为嵌套文档

2. Your data field is stored as an nested document

您必须先进行嵌套的子聚合,然后再进行术语聚合.

You have to add a nested subaggregation before doing terms aggregation.

POST /test/test/_search
{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggs": {
    "Nest": {
      "nested": {
        "path": "data"
      },
      "aggs": {
        "Grouping": {
          "terms": {
            "field": "data.title",
            "size": 0
          }
        }
      }
    }
  }
}

两者都将输出以下内容:

{
   "took": 125,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "Nest": {
         "doc_count": 9,
         "Grouping": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
               {
                  "key": "vule",
                  "doc_count": 6            -- The Total count you're looking for
               },
               {
                  "key": "sonder",
                  "doc_count": 2
               },
               {
                  "key": "omer",
                  "doc_count": 1
               }
            ]
         }
      }
   }
}

不幸的是,这只是一个原始查询,但我想它可以转换为 elastic.js 很容易.

This, unfortunately, is just a raw query, but I imagine that it can be translated into elastic.js quite easily.

最重要的是.如果要进行聚合,请不要忘记将要进行聚合的字段设置为 not_analyzed ,因为它会像

On top of that. If you're going to do aggregations, don't forget to set your fields, that you're doing aggregations on, as not_analyzed, because it will start counting individual tokens as in documentation

我本人会将这些文档存储为嵌套文档.

I, myself, would store these documens as nested ones.

示例:

映射:

PUT /test
{
  "mappings": {
    "test": {
      "properties": {
        "name": {
          "type": "string"
        },
        "data": {
          "type": "nested",
          "properties": {
            "title": {
              "type": "string",
              "index": "not_analyzed",
              "fields": {
                "stemmed": {
                  "type": "string",
                  "analyzed": "standard"
                }
              }
            },
            "creationDate": {
              "type": "date",
              "format": "dateOptionalTime"
            },
            "quantity": {
              "type": "integer"
            }
          }
        }
      }
    }
  }
}

测试数据:

PUT /test/test/56161cb3cbdad2e3b437fdc3
{
  "name": "public",
  "data": [
    {
      "title": "sonder",
      "creationDate": "2015-08-22T03:43:28",
      "quantity": 0
    },
    {
      "title": "vule",
      "creationDate": "2014-05-17T12:35:01",
      "quantity": 0
    },
    {
      "title": "omer",
      "creationDate": "2015-01-31T04:54:19",
      "quantity": 3
    },
    {
      "title": "sonder",
      "creationDate": "2014-05-22T05:09:36",
      "quantity": 3
    }
  ]
}

PUT /test/test/56161cb3dae30517fc133cd9
{
  "name": "static",
  "data": [
    {
      "title": "vule",
      "creationDate": "2014-07-01T06:32:06",
      "quantity": 5
    },
    {
      "title": "vule",
      "creationDate": "2014-01-10T12:40:28",
      "quantity": 1
    },
    {
      "title": "vule",
      "creationDate": "2014-01-09T09:33:11",
      "quantity": 3
    }
  ]
}

PUT /test/test/56161cb32f62b522355ca3c8
{
  "name": "public",
  "data": [
    {
      "title": "vule",
      "creationDate": "2014-02-03T09:55:28",
      "quantity": 2
    },
    {
      "title": "vule",
      "creationDate": "2015-01-23T09:14:22",
      "quantity": 0
    }
  ]
}

实际查询:

POST /test/test/_search
{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggs": {
    "Nest": {
      "nested": {
        "path": "data"
      },
      "aggs": {
        "Grouping": {
          "terms": {
            "field": "data.title",
            "size": 0
          }
        }
      }
    }
  }
}

P.S."size":0 意味着我让Elasticsearch输出所有可能的术语,而不是按照

P.S. "size":0 means that I'm letting Elasticsearch output all possible terms and not limiting its output as described in documentation.

size 参数可以设置为定义术语桶的数量从总条款列表中退回.默认情况下,该节点协调搜索过程将要求每个分片提供其拥有自己的顶级 size 术语存储桶,一旦所有分片都响应,它将减少结果返回到最终列表,然后将其返回到客户.这意味着,如果唯一项的数量大于 size ,返回的列表略有偏差,不准确(可能是该术语的数量略有减少,甚至可能是一个术语则应该返回最大尺寸的存储桶中).如果设置为 0 size 将设置为 Integer.MAX_VALUE .

The size parameter can be set to define how many term buckets should be returned out of the overall terms list. By default, the node coordinating the search process will request each shard to provide its own top size term buckets and once all shards respond, it will reduce the results to the final list that will then be returned to the client. This means that if the number of unique terms is greater than size, the returned list is slightly off and not accurate (it could be that the term counts are slightly off and it could even be that a term that should have been in the top size buckets was not returned). If set to 0, the size will be set to Integer.MAX_VALUE.

这篇关于如何在elastic.js(elasticsearch)中计算具有相同值的字段?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆