使用多个字段作为唯一键的Dedup elasticsearch结果 [英] Dedup elasticsearch results using multiple fields as unique key

查看：130 发布时间：2020/10/27 0:51:20 elasticsearch duplicates

本文介绍了使用多个字段作为唯一键的Dedup elasticsearch结果的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对此也有类似的问题（请参阅删除重复的文档来自Elasticsearch中的搜索），但我还没有找到使用多个字段作为唯一键来进行重复操作的方法。这是一个简单的示例，以说明我在寻找什么：

There have been similar question asked to this (see Remove duplicate documents from a search in Elasticsearch) but I haven't found a way to dedup using multiple fields as the "unique key". Here's a simple example to illustrate a bit of what I'm looking for:

说这是我们的原始数据：

Say this is our raw data:

{ "name": "X", "event": "A", "time": 1 }
{ "name": "X", "event": "B", "time": 2 }
{ "name": "X", "event": "B", "time": 3 }
{ "name": "Y", "event": "A", "time": 4 }
{ "name": "Y", "event": "C", "time": 5 }

我基本上想根据名称和事件获得不同的事件计数。我想避免重复计算两次在相同名称X上发生的事件B，所以我要查找的计数是：

I would essentially like to get the distinct event counts based on name and event. I want to avoid double counting the event B which happened on the same name X twice, so the counts I'd be looking for are:

event: A, count: 2
event: B, count: 1
event: C, count: 1

有没有一种方法可以设置agg查询，如相关问题所示？我讨论过的另一种选择是使用特殊键字段（即 X_A， X_B等）为对象建立索引。然后，我可以在这个领域简单地重复。我不确定哪种方法更可取，但我个人不希望使用额外的元数据来索引数据。

Is there a way to set up an agg query as seen in the related question? Another option I've deliberated is to index the object with a special key field (i.e. "X_A", "X_B", etc.). I could then simply dedup on this field. I'm not sure which is a preferred approach, but I'd personally prefer not to index the data with extra metadata.

推荐答案

您可以在条款聚合中指定脚本，以便从多个字段中构建密钥：

You can specify a script in a terms aggregation in order to build a key out of multiple fields:

POST /test/dedup/_search
{
  "aggs":{
    "dedup" : {
      "terms":{
        "script": "[doc.name.value, doc.event.value].join('_')"
       },
       "aggs":{
         "dedup_docs":{
           "top_hits":{
             "size":1
           }
         }
       }    
    }
  }
}

这基本上将提供以下结果：

This will basically provide the following results:

X_A：1

X_B：2

Y_A：1

Y_C：1

X_A: 1
X_B: 2
Y_A: 1
Y_C: 1

注意：示例数据中只有一个事件 C ，因此除非我错过了某些东西，否则计数不能为两个。

Note: There's only one event C in your sample data, so the count cannot be two unless I'm missing something.

这篇关于使用多个字段作为唯一键的Dedup elasticsearch结果的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用多个字段作为唯一键的Dedup elasticsearch结果 [英] Dedup elasticsearch results using multiple fields as unique key

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用多个字段作为唯一键的Dedup elasticsearch结果 [英] Dedup elasticsearch results using multiple fields as unique key

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭