如何在从logstash索引到弹性搜索时重复数据删除文档 [英] How to deduplicate documents while indexing into elasticsearch from logstash
问题描述
我正在使用Logstash 1.4.1和ES1.01,并希望根据计算的校验和替换已经编入索引的文档。我正在Logstash中使用指纹过滤器,它基于指定的算法创建一个指纹字段。现在 - 我想要完成的是,ES 根据相同的指纹值替换已经存在的文档。
I'm using Logstash 1.4.1 together with ES1.01 and would like to replace already indexed documents based on a calculated checksum. I'm currently using the "fingerprint" filter in Logstash which creates a "fingerprint" field based on a specified algorithm. Now - what I want to accomplish is that ES replaces an already existing document based on an identical fingerprint value.
说,例如我具有指纹字段值为2c9a6802e10fbcff36177e0b88993f90868fa6fa的文档。现在 - 如果具有相同指纹值的文档即将被索引,我希望它替换索引中已经存在的旧的文档。
Say, for example that I have a document with a fingerprint-field value of "2c9a6802e10fbcff36177e0b88993f90868fa6fa". Now - if a document with an identical fingerprint value is about to be indexed, I want it to replace the old document already present in the index.
我试图将以下内容添加到我认为由Logstash ES输出插件使用的elasticsearch-template.json模板文件中:
I've tried to add the following to the "elasticsearch-template.json" template file which I assume is used by the Logstash ES-output plugin:
...
"mappings" : {
"_default_" : {
"_id" : {"index": "not_analyzed", "store" : false, "path" : "fingerprint" },
"_all" : {"enabled" : true},
"dynamic_templates" : [ {
...
但它不工作。我在这里做错了什么?
but it doesn't work. What am I doing wrong here?
干杯
推荐答案
在logstash弹性搜索输出部分中使用document_id参数:
I would use the document_id parameter in your logstash elasticsearch output section:
document_id
Value type is string
Default value is nil
索引的文档ID。用于覆盖Elasticsearch中具有相同ID的现有条目
。
The document ID for the index. Useful for overwriting existing entries in Elasticsearch with the same ID.
我相信该条目应该是这样的:
I believe the entry should be something like this:
document_id => "%{fingerprint}"
它使用logstash的sprintf格式来替换一个字段的内容, :
It uses logstash's sprintf format to replace a string with the contents of a field:
https://www.elastic.co/guide/en/logstash/current/event-dependent-configuration.html#sprintf
这篇关于如何在从logstash索引到弹性搜索时重复数据删除文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!