Kibana-如何从现有Kubernetes日志中提取字段 [英] Kibana - How to extract fields from existing Kubernetes logs
问题描述
我有一种ELK堆栈,使用Fluentd而不是logstash,在Kubernetes集群上作为DaemonSet运行,并将所有容器中的所有日志以logstash格式发送到Elasticsearch服务器.
I have a sort of ELK stack, with fluentd instead of logstash, running as a DaemonSet on a Kubernetes cluster and sending all logs from all containers, in logstash format, to an Elasticsearch server.
在Kubernetes集群上运行的许多容器中,有些是nginx容器,它们输出以下格式的日志:
Out of the many containers running on the Kubernetes cluster some are nginx containers which output logs of the following format:
121.29.251.188 - [16/Feb/2017:09:31:35 +0000] host="subdomain.site.com" req="GET /data/schedule/update?date=2017-03-01&type=monthly&blocked=0 HTTP/1.1" status=200 body_bytes=4433 referer="https://subdomain.site.com/schedule/2589959/edit?location=23092&return=monthly" user_agent="Mozilla/5.0 (Windows NT 6.1; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0" time=0.130 hostname=webapp-3188232752-ly36o
在Kibana中可见的字段如下图所示:
The fields visible in Kibana are as per this screenshot:
在对这种类型的日志建立索引之后是否可以从中提取字段?
Is it possible to extract fields from this type of log after it was indexed?
fluentd收集器配置有以下源代码,该源代码可处理所有容器,因此由于来自不同容器的输出截然不同,因此在此阶段无法强制执行格式:
The fluentd collector is configured with the following source, which handles all containers, so enforcing a format at this stage is not possible due to the very different outputs from different containers:
<source>
type tail
path /var/log/containers/*.log
pos_file /var/log/es-containers.log.pos
time_format %Y-%m-%dT%H:%M:%S.%NZ
tag kubernetes.*
format json
read_from_head true
</source>
在理想情况下,我想用日志"字段中的元字段(例如主机",要求",状态"等)丰富上面屏幕截图中的可见字段.
In an ideal situation, I would like to enrich the fields visible in the screenshot above with the meta-fields in the "log" field, like "host", "req", "status" etc.
推荐答案
After a few days of research and getting accustomed to the EFK stack, I arrived to an EFK specific solution, as opposed to that in Darth_Vader's answer, which is only possible on the ELK stack.
总而言之,我使用的是Fluentd而不是Logstash,因此,如果您还安装了 Fluentd Grok插件,我决定不这样做,因为:
So to summarize, I am using Fluentd instead of Logstash, so any grok solution would work if you also install the Fluentd Grok Plugin, which I decided not to do, because:
As it turns out, Fluentd has its own field extraction functionality through the use of parser filters. To solve the problem in my question, right before the <match **>
line, so after the log line object was already enriched with kubernetes metadata fields and labels, I added the following:
<filter kubernetes.var.log.containers.webapp-**.log>
type parser
key_name log
reserve_data yes
format /^(?<ip>[^-]*) - \[(?<datetime>[^\]]*)\] host="(?<hostname>[^"]*)" req="(?<method>[^ ]*) (?<uri>[^ ]*) (?<http_version>[^"]*)" status=(?<status_code>[^ ]*) body_bytes=(?<body_bytes>[^ ]*) referer="(?<referer>[^"]*)" user_agent="(?<user_agent>[^"]*)" time=(?<req_time>[^ ]*)/
</filter>
说明:
<filter kubernetes.var.log.containers.webapp-**.log>
-在与该标签匹配的所有行上应用该块;在我的情况下,Web服务器组件的容器称为webapp- {something}
<filter kubernetes.var.log.containers.webapp-**.log>
- apply the block on all the lines matching this label; in my case the containers of the web server component are called webapp-{something}
type parser
-告诉fluentd应用解析器过滤器
type parser
- tells fluentd to apply a parser filter
key_name log
-仅在日志行的log
属性上应用模式,而不是在整个行(即json字符串)上应用模式
key_name log
- apply the pattern only on the log
property of the log line, not the whole line, which is a json string
reserve_data yes
-非常重要,如果未指定,则整个日志行对象仅由从format
提取的属性替换,因此,如果您已经具有其他属性,例如由kubernetes_metadata
过滤器添加的属性,则这些不添加reserve_data
选项
reserve_data yes
- very important, if not specified the whole log line object is replaced by only the properties extracted from format
, so if you already have other properties, like the ones added by the kubernetes_metadata
filter, these are removed when not adding the reserve_data
option
format
-应用于log
键的值的正则表达式,以提取命名属性
format
- a regex that is applied on the value of the log
key to extract named properties
请注意,因为我使用的是Fluentd 1.12,所以该语法与较新的1.14语法不完全兼容,但是该原理将对解析器声明进行一些细微调整.
Please note that I am using Fluentd 1.12, so this syntax is not fully compatible with the newer 1.14 syntax, but the principle will work with minor tweaks to the parser declaration.
这篇关于Kibana-如何从现有Kubernetes日志中提取字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!