logstash + elasticsearch:重新加载相同的数据 [英] logstash + elasticsearch : reloads the same data

查看:274
本文介绍了logstash + elasticsearch:重新加载相同的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

管理以获取logstash(1.3.1)将数据发送到弹性搜索(0.9.5)。

Managed to get logstash (1.3.1) to send data to elasticsearch (0.9.5).

我的logstash conf文件设置是

My logstash conf file setup is

input {
  file {
    path => ["D:/apache-tomcat-7.0.5/logs/*.*"]
   }

}
 output {
  stdout { } 
    elasticsearch_http {
    host => "localhost"
    port => 9200
   }
 }

数据存储在索引logstash- 2013.12.xx

The data is stored in ES under index logstash-2013.12.xx

但是,如果我重新启动logstash,让我们说第二天 - 相同的数据被重新加载到一个新的索引中。即使我重新启动,文档计数在索引中加倍。

However, if i restart logstash, lets say next day - the same data is reloaded into a new index. Even if i restart again, the document count doubles in the index.

看起来像logstash重读数据,ES也是复制文件。

Seems like logstash re-reading the data and ES is also duplicating the documents.

有没有办法不要在logstash中重新加载,或者不要在ES中重新加载,或者做BOTH。

Is there a way to not reload in logstash or not duplicate in ES or do BOTH.

推荐答案

我在Logstash 1.3.3中遇到了这个问题好。 Logstash Jira的相关错误报告是 LOGSTASH-429文件输入 - .sincedb文件在Windows上已损坏。还有一个由Boyd Meier创建的补丁。

I ran across this issue with Logstash 1.3.3 as well. The relevant bug report on Logstash Jira is LOGSTASH-429 File Input - .sincedb file is broken on Windows. There has also been a patch created by Boyd Meier.

此补丁也被拉入Jordan Sissel的ruby-filewatch git存储库,以便包含在更高版本中,但它只有

This patch has also been pulled into Jordan Sissel's ruby-filewatch git repository for inclusion in a later version however it just hasn't made it in yet.

该问题来自Logstash,使用的文件inode在Windows上总是返回0。 Boyd Meier使用文件ID获取文件的标识符以绕过该问题。该文件ID保持不变,直到文件从卷中删除。

The issue comes from Logstash using the file inode which always returns 0 on Windows. Boyd Meier uses the File ID to get an identifier for the file to bypass the issue. This file id remains the same until the file is deleted from the volume.

如果你喜欢做一点补丁,你可以补丁从Jordan Sissel的红宝石-filewatch git存储库。对于1.3.3,我刚刚打补丁,在测试日志文件的过程中,步骤是:

If you're comfortable doing a bit of patching you can patch the change in from Jordan Sissel's ruby-filewatch git repository. For 1.3.3 that I have only just patched and am in the process of testing against test log files the steps were:


  1. 下载ruby-来自Github的filewatch zip文件: Jordan Sissel的ruby-filewatch
    git存储库

  2. 将您下载的zip文件解压缩到新目录

  3. 我不得不更改Ruby-filewatch\lib\filwatch\tail.rb
    文件 - >第10行读取需要JRubyFileExtension.jar。我有
    更改为需要java / JRubyFileExtension.jar,否则我
    得到一个错误,当
    尝试读取文件时,它无法找到jar文件。作为参考,使整行显示为:需要java / JRubyFileExtension.jar如果定义? JRUBY_VERSION

  4. 在7-Zip中打开logstash-1.3.3-flatjar.jar文件

  5. 拖放java目录从ruby-filewatch到7-Zip中的根
    文件夹

  6. 从ruby-filewatch\lib\filewatch
    文件夹中拖放所有文件进入7-Zip中的filewatch文件夹,覆盖任何现有的
    文件

  1. Download ruby-filewatch zip file from Github: Jordan Sissel's ruby-filewatch git repository
  2. Unzip the zip file you downloaded to a new directory
  3. I had to make a change to the Ruby-filewatch\lib\filwatch\tail.rb file -> Line 10 which reads require "JRubyFileExtension.jar". I had to change to require "java/JRubyFileExtension.jar" as otherwise I was getting an error that it wasn't able to find the jar file when trying to read a file. For reference that makes the whole line appear as: require "java/JRubyFileExtension.jar" if defined? JRUBY_VERSION
  4. Open logstash-1.3.3-flatjar.jar file in 7-Zip
  5. Drag and drop the java directory from ruby-filewatch into the root folder in 7-Zip
  6. Drag and drop all the files from the ruby-filewatch\lib\filewatch folder into the filewatch folder in 7-Zip, overwriting any existing files

现在,当您针对多个日志文件运行它时应该发现sincedb包含多个条目,并且条目看起来类似于1717916447-2604966-851968 0 2 428312038.如果您在找到sincedb文件并且没有在配置文件中设置sincedb_path时遇到问题,可以在运行jar的用户的主目录。如果这是您的用户,您可以使用Windows键+运行 - >%USERPROFILE% - >确定轻松访问。

Now when you run it against multiple log files you should find that sincedb contains more than one entry and the entries appear similar to 1717916447-2604966-851968 0 2 428312038. If you're having trouble finding the sincedb file and haven't set sincedb_path in your config file it can be found in the home directory of the user running the jar. If this is your user you can get to it easily using Windows key + Run -> %USERPROFILE% -> OK.

如前所述,修补和测试之前一定要注意部署到生产系统。

As always take care when patching and test thoroughly before deploying to production systems.

这篇关于logstash + elasticsearch:重新加载相同的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆