如何在从logstash索引到弹性搜索时重复数据删除文档 [英] How to deduplicate documents while indexing into elasticsearch from logstash

查看:1050
本文介绍了如何在从logstash索引到弹性搜索时重复数据删除文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Logstash 1.4.1和ES1.01,并希望根据计算的校验和替换已经编入索引的文档。我正在Logstash中使用指纹过滤器,它基于指定的算法创建一个指纹字段。现在 - 我想要完成的是,ES 根据相同的指纹值替换已经存在的文档

I'm using Logstash 1.4.1 together with ES1.01 and would like to replace already indexed documents based on a calculated checksum. I'm currently using the "fingerprint" filter in Logstash which creates a "fingerprint" field based on a specified algorithm. Now - what I want to accomplish is that ES replaces an already existing document based on an identical fingerprint value.

说,例如我具有指纹字段值为2c9a6802e10fbcff36177e0b88993f90868fa6fa的文档。现在 - 如果具有相同指纹值的文档即将被索引,我希望它替换索引中已经存在的旧的文档。

Say, for example that I have a document with a fingerprint-field value of "2c9a6802e10fbcff36177e0b88993f90868fa6fa". Now - if a document with an identical fingerprint value is about to be indexed, I want it to replace the old document already present in the index.

我试图将以下内容添加到我认为由Logstash ES输出插件使用的elasticsearch-template.json模板文件中:

I've tried to add the following to the "elasticsearch-template.json" template file which I assume is used by the Logstash ES-output plugin:

...
  "mappings" : {
    "_default_" : {
       "_id" : {"index": "not_analyzed", "store" : false, "path" : "fingerprint" },
       "_all" : {"enabled" : true},
       "dynamic_templates" : [ {
...

但它不工作。我在这里做错了什么?

but it doesn't work. What am I doing wrong here?

干杯

推荐答案

在logstash弹性搜索输出部分中使用document_id参数:

I would use the document_id parameter in your logstash elasticsearch output section:


document_id

Value type is string
Default value is nil

索引的文档ID。用于覆盖Elasticsearch中具有相同ID的现有条目

The document ID for the index. Useful for overwriting existing entries in Elasticsearch with the same ID.

https://www.elastic.co/guide/en/logstash /current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-document_id

我相信该条目应该是这样的:

I believe the entry should be something like this:

document_id => "%{fingerprint}"

它使用logstash的sprintf格式来替换一个字段的内容, :

It uses logstash's sprintf format to replace a string with the contents of a field:

https://www.elastic.co/guide/en/logstash/current/event-dependent-configuration.html#sprintf

这篇关于如何在从logstash索引到弹性搜索时重复数据删除文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆