在Logstash ElasticSearch中将_Id设置为更新键 [英] Set _Id as update key in logstash elasticsearch

查看:935
本文介绍了在Logstash ElasticSearch中将_Id设置为更新键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的索引如下:

{
"_index": "mydata",
"_type": "_doc",
"_id": "PuhnbG0B1IIlyY9-ArdR",
"_score": 1,
"_source": {
"age": 9,
"@version": "1",
"updated_on": "2019-01-01T00:00:00.000Z",
"id": 4,
"name": "Emma",
"@timestamp": "2019-09-26T07:09:11.947Z"
}

因此输入了我的用于更新数据的logstash conf {

So my logstash conf for updaing data is input {

    jdbc {
        jdbc_connection_string => "***"
        jdbc_driver_class =>  "***"
    jdbc_driver_library => "***"
        jdbc_user => ***
        statement => "SELECT * from agedata WHERE updated_on > :sql_last_value ORDER BY updated_on"
    use_column_value =>true
        tracking_column =>updated_on
        tracking_column_type => "timestamp"
    }
}
output {
          elasticsearch { hosts => ["localhost:9200"] 
        index => "mydata" 
        action => update
            document_id => "{_id}"
            doc_as_upsert =>true}
          stdout { codec => rubydebug }
       }

因此,当我在同一行中进行任何更新后运行此命令时,我的预期输出是更新我在该行中所做的任何更改的现有_id值。
但是我的Elasticsearch会将其索引为新行,其中我的_id被视为字符串。

So, when i run this after any updation in the same row, my expected output is to update the existing _id values for any changes i made in that row. But my Elasticsearch is indexing it as a new row where my _id is considered as a string.

"_index": "agesep",
"_type": "_doc",
"_id": ***"%{_id}"***

当我使用document_id =>%{id}时重复出现:
实际值:

The duplicate occurs when i use document_id => "%{id}" as: actual:

         {
"_index": "mydata",
"_type": "_doc",
"_id": "BuilbG0B1IIlyY9-4P7t",
"_score": 1,
"_source": {
"id": 1,
"age": 13,
"name": "Greg",
"updated_on": "2019-09-26T08:11:00.000Z",
"@timestamp": "2019-09-26T08:17:52.974Z",
"@version": "1"
}
}

重复:

{
"_index": "mydata",
"_type": "_doc",
"_id": "1",
"_score": 1,
"_source": {
"age": 56,
"@version": "1",
"id": 1,
"name": "Greg",
"updated_on": "2019-09-26T08:18:00.000Z",
"@timestamp": "2019-09-26T08:20:14.561Z"
}

在ES中进行更新时,如何考虑现有的_id而不创建重复的值?
我的期望是基于_id更新索引中的数据,而不创建新的更新行。

How do i get it to consider the existing _id and not create a duplicate value when i make updates in ES? My expectation is to update data in the index based on the _id, and not create a new row of update.

推荐答案

我建议使用 id 代替 _id

        document_id => "%{id}"

这篇关于在Logstash ElasticSearch中将_Id设置为更新键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆