向ElasticSearch项集合添加其他字段 [英] Adding additional fields to ElasticSearch terms aggregation

查看:153
本文介绍了向ElasticSearch项集合添加其他字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

索引的文件如下:

  {
id:1,
title:'Blah' ,
...
平台:{id:84,url:'http://facebook.com',标题:'Facebook'}
...
}

我想要的是逐个计数和输出统计数据。
对于计数,我可以使用 platform.id 的术语汇总作为一个字段计数:

  aggs:{
platforms:{
terms:{field:'platform.id'}
}
}

以这种方式,我会收到一个多个水桶的统计信息,如 {key:8,doc_count:162511} / code>,如预期的那样。



现在,我可以以某种方式添加到这些存储库中 platform.name platform.url (为了统计数据的输出)?我遇到的最好的就是:

  aggs:{
platforms:{
terms: {field:'platform.id'},
aggs:{
name:{terms:{field:'platform.name'}},
url:{terms:{field: platform.url'}}
}
}
}

实际上,它在每个数据桶中都有相当复杂的结构:

  {key:7,
doc_count :528568,
url:
{doc_count_error_upper_bound:0,
sum_other_doc_count:0,
buckets:[{key:http://facebook.com,doc_count:528568} ]},
名称:
{doc_count_error_upper_bound:0,
sum_other_doc_count:0,
buckets:[{key:Facebook,doc_count:528568}]}},

当然,可以从此结构中提取平台的名称和URL(例如 bucket .url.buckets.first.key ),但是是否有更干净,简单的方式来完成任务?

解决方案

似乎最好的方法是表达意图是 top hits 聚合:从每个聚合组中只选择一个文档,然后从中提取平台:

  aggs:{
platforms:{
terms:{field:'platform.id'},
aggs:{
平台:{top_hits:{size:1,_source:{include:['platform']}}}
}
}

这样一来,每个扣除的样子就像:

  {key :
doc_count:529939
platform:{
hits:{
hits:[{
_source
platform:
{id:7,name:Facebook,url:http://facebook.com}
}
}]
}
},
}

哪个是太清了(像ES一样),bu t clean: bucket.platform.hits.hits.first._source.platform


Indexed documents are like:

{
  id: 1, 
  title: 'Blah',
  ...
  platform: {id: 84, url: 'http://facebook.com', title: 'Facebook'}
  ...
}

What I want is count and output stats-by-platform. For counting, I can use terms aggregation with platform.id as a field to count:

aggs: {
  platforms: {
    terms: {field: 'platform.id'}
  }
}

This way I receive stats as a multiple buckets looking like {key: 8, doc_count: 162511}, as expected.

Now, can I somehow add to those buckets also platform.name and platform.url (for pretty output of stats)? The best I've came with looks like:

aggs: {
  platforms: {
    terms: {field: 'platform.id'},
    aggs: {
      name: {terms: {field: 'platform.name'}},
      url: {terms: {field: 'platform.url'}}
    }
  }
}

Which, in fact, works, and returns pretty complicated structure in each bucket:

{key: 7,
  doc_count: 528568,
  url:
   {doc_count_error_upper_bound: 0,
    sum_other_doc_count: 0,
    buckets: [{key: "http://facebook.com", doc_count: 528568}]},
  name:
   {doc_count_error_upper_bound: 0,
    sum_other_doc_count: 0,
    buckets: [{key: "Facebook", doc_count: 528568}]}},

Of course, name and url of platform could be extracted from this structure (like bucket.url.buckets.first.key), but is there more clean and simple way to do the task?

解决方案

It seems the best way to show intentions is top hits aggregation: "from each aggregated group select only one document", and then extract platform from it:

aggs: {
  platforms: {
    terms: {field: 'platform.id'},
    aggs: {
      platform: {top_hits: {size: 1, _source: {include: ['platform']}}}
  }
}

This way, each bucked will look like:

{"key": 7,
  "doc_count": 529939,
  "platform": {
    "hits": {
      "hits": [{
       "_source": {
        "platform": 
          {"id": 7, "name": "Facebook", "url": "http://facebook.com"}
        }
      }]
    }
  },
}

Which is kinda too deeep (as usual with ES), but clean: bucket.platform.hits.hits.first._source.platform

这篇关于向ElasticSearch项集合添加其他字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆