文档计数相同,但每次运行logstash时索引大小都在增加 [英] Document count is same but index size is growing every logstash run
问题描述
我正在使用mysql数据库中包含的数据的logstash发送elasticsearch.
I'm sending elasticsearch using the logstash of the data contained in the mysql database.
但是每次运行logstash时,文档数保持不变,但是索引大小会增加.
but each time logstash runs, the number of documents remains the same, but the index size increases.
首次运行计数:333 | 字节大小:206kb
first run count: 333 | size in bytes : 206kb
立即计数:333 | 大小(以字节为单位):1.6MB
now count:333 | size in bytes : 1.6MB
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://***rds.amazonaws.com:3306/"
jdbc_user => "***"
jdbc_password => "***"
jdbc_driver_library => "***\mysql-connector-java-5.1.46/mysql-connector-java-5.1.46-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
statement => "SELECT id,title,url, FROM tableName"
schedule => "*/2 * * * *"
}
}
filter {
json {
source => "texts"
target => "texts"
}
mutate { remove_field => [ "@version", "@timestamp" ] }
}
output {
stdout {
codec => json_lines
}
amazon_es {
hosts => ["***es.amazonaws.com"]
document_id => "%{id}"
index => "texts"
region => "***"
aws_access_key_id => '***'
aws_secret_access_key => '***'
}
}
推荐答案
显然,您总是一遍又一遍地发送相同的数据.在ES中,每次更新文档时(即使用相同的ID),旧版本都会被删除并保留在索引中一段时间(直到基础索引段被合并).
Apparently you're always sending the same data over and over. In ES, each time you update a document (i.e. by using the same ID), the older version gets deleted and stays in the index for a while (until the underlying index segments get merged).
在每次运行之间,您可以发出以下命令:
Between each run, you can issue the following command:
curl -XGET ***es.amazonaws.com/_cat/indices?v
在收到的回复中,检查 docs.deleted
列,您会看到已删除文档的数量增加了.
In the response you get, check the docs.deleted
column and you'll see that the number of deleted documents increases.
这篇关于文档计数相同,但每次运行logstash时索引大小都在增加的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!