使用单个logstash过滤器从Elasticsearch的多个事件中提取数据 [英] extracting data from multiple events from Elasticsearch using single logstash filter

查看:447
本文介绍了使用单个logstash过滤器从Elasticsearch的多个事件中提取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在ElasticSearch中加载了日志行,该日志行的数据分散在多个事件中,比如说event_id位于事件(第5行)中,event_action位于事件编号88中,更多的event_port信息位于事件编号455中.我提取此数据,以便我的输出如下所示.在这种情况下,多行编解码器将无法工作.

I have log lines loaded in ElasticSearch which has the data scattered in multiple events, say event_id is in event (line) number 5 and event_action is available in event number 88, further event_port information is available in event number 455. How can i extract this data so that my output looks like following. For this case multiline codec will not work.

{
event_id: 1223
event_action: "socket_open"
event_port: 76654
}

当前,我保留了日志文件,因此可以从ES获取文件路径.我试图从ruby过滤器执行一个shell脚本,这个shell脚本可以执行grep命令,并将stdout数据放入一个新事件中,如下所示.

Currently I have the log files persisted so i can get the file path from ES. I tried to have a shell script executed from ruby filter, this shell script can perform grep commands and put the stdout data in a new event, like following.

input {
  elasticsearch {
    hosts => "localhost:9200"
    index => "my-logs"
  }
}

filter
{

     ruby {
    code => 'require "open3"
             file_path = event.get("file_path")
             cmd =  "my_filter.sh -f #{file_path}"
             stdin, stdout, stderr = Open3.popen3(cmd)
             event.set("process_result", stdout.read)
             err = stderr.read
             if err.to_s.empty?
               filter_matched(event)
             else
               event.set("ext_script_err_msg", err)
             end'
      remove_field => ["file_path"]
   }
   }

使用上述方法,我面临问题.

With above approach I am facing problems.

1)在大文件中执行grep可能很耗时.是否有其他选择,而不必对文件进行grep?

1) Doing grep in huge files can be time consuming. Is there any alternative, whithout having to grep on files?

2)我的输入插件(如上所述)正在从Elastic Search中获取事件,该事件为索引中的"ALL"事件设置了file_path,这使my_filter.sh可以多次执行,这是我要避免的事情.如何从ES中提取唯一的file_path?

2) My input plugin (attached above) is taking events from Elastic Search which has file_path set for "ALL" events in an index, this makes my_filter.sh to be executed multiple times which is something I want to avoid. How can I extract unique file_path from ES?

推荐答案

Elasticsearch并未根据输入来构建输出流. Elastic是一个noSQL数据库,该数据库应随时间消耗数据(以实时方式).这意味着您应该首先将所有内容存储在Elasticsearch中,然后再处理数据.就您而言,您正在等待其他事件来阻塞流程.

Elasticsearch was not made to build output stream depending on input. Elastic is a noSQL database, where the data should be consumed over the time (in a real time approach). It means that you should first store everything in your Elasticsearch, and process the data after. In your case, you are stucking the flow by waiting different events.

如果您确实需要捕获这些事件并在后台对其进行处理,则可以在进行logstash(输入为nxlog)过滤或使用python脚本(用作logstash中的过滤器)之前尝试使用类似nxlog的方法.在您的情况下,我将对数据进行预处理以将其合并,然后将其发送到logstash

If you really need to catch these events and process it in background, you could try something like nxlog before filtering in logstash (input is nxlog) or a python script (use it as a filter in logstash). In your case I would pre-process my data to consolidate it, and then send it to logstash

这篇关于使用单个logstash过滤器从Elasticsearch的多个事件中提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆