Logstash + Kibana术语面板不会破坏单词 [英] Logstash + Kibana terms panel without breaking words

查看:133
本文介绍了Logstash + Kibana术语面板不会破坏单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Java应用程序,它以json格式写入日志文件。
日志中的字段是可变的。
logstash读取此日志文件并将其发送给Kibana。

I have a Java application that writes to a log file in json format. The fields that come in the logs are variable. The logstash reads this logfile and sends it to Kibana.

我已使用以下文件配置了logstash:

I've configured the logstash with the following file:

input {
        file {
                path => ["[log_path]"]
                codec => "json"
        }
}

filter{
        json {
                source => "message"
        }

        date {
                match => [ "data", "dd-MM-yyyy HH:mm:ss.SSS" ]
                timezone => "America/Sao_Paulo"
        }
}

output {
        elasticsearch_http {
                flush_size => 1
                host => "[host]"
                index => "application-%{+YYYY.MM.dd}"
        }
}



<我已经设法在没有任何映射的情况下正确显示Kibana中的所有内容。
但是当我尝试创建一个术语面板来显示发送这些消息的服务器的数量时,我遇到了问题。
我的json中有一个名为server的字段,显示服务器名称(如:a1-name-server1),但由于 - ,术语面板拆分服务器名称。
此外,我想计算出现错误消息的次数,但出现同样的问题,因为条件面板因空格而拆分错误消息。

I've managed to show correctly everything in Kibana without any mapping. But when I try to create a terms panel to show a count of the servers who sent those messages I have a problem. I have a field called server in my json, that show the servers name (like: a1-name-server1), but the terms panel split the server name because of the "-". Also I would like to count the number of times that a error message appears, but the same problem occurs, because the terms panel split the error message because of the spaces.

我正在使用Kibana 3和Logstash 1.4。
我在网上搜索了很多,找不到任何解决方案。
我也尝试使用logstash中的.raw,但它不起作用。

I'm using Kibana 3 and Logstash 1.4. I've searched a lot on the web and couldn't find any solution. I also tried using the .raw from logstash, but it didn't work.

我该如何管理?

感谢您的帮助。

推荐答案

您的问题是您的数据正在被标记化。这有助于对您的数据进行任何搜索。 ES(默认情况下)会将您的字段消息拆分为不同的部分,以便能够搜索它们。例如,您可能希望在日志中搜索单词 ERROR ,因此您可能希望在结果消息中看到有一个错误在您的群集中或错误处理任何。如果您不使用 tokenizers ,你将无法像这样搜索。

Your problem here is that your data is being tokenized. This is helpful to make any search over your data. ES (by default) will split your field message split into different parts to be able to search them. For example you may want to search for the word ERROR in your logs, so you probably would like to see in the results messages like "There was an error in your cluster" or "Error processing whatever". If you don't analyze the data for that field with tokenizers, you won't be able to search like this.

当你想要搜索时,这个分析行为很有帮助事情,但它不允许您在具有相同内容的不同消息时进行分组。这是你的用例。对此的解决方案是更新您的映射,为您不想拆分为令牌的特定字段添加 not_analyzed 。这可能适用于您的主机字段,但可能会破坏搜索。

This analyzed behaviour is helpful when you want to search things, but it doesn't allow you to group when different messages that have the same content. This is your usecase. The solution to this is to update your mapping putting not_analyzed for that specific field that you don't want to split into tokens. This will probably work for your host field, but will probably break the search.

我通常会为这些做什么种情况是使用索引模板多字段。索引模板允许我为匹配正则表达式的每个索引设置映射,多字段允许我分析 not_analyzed 在同一字段中的行为。

What I usually do for these kind of situations is to use index templates and multifields. The index template allow me to set a mapping for every index that match a regex and the multifields allow me to have the analyzed and not_analyzed behaviour in a same field.

使用以下查询可以解决您的问题:

Using the following query would do the job for your problem:

curl -XPUT https://example.org/_template/name_of_index_template -d '
{
    "template": "indexname*",
    "mappings": {
        "type": {
            "properties": {
               "field_name": {
                  "type": "multi_field",
                  "fields": {
                     "field_name": {
                         "type": "string",
                         "index": "analyzed"
                     },
                     "untouched": {
                         "type": "string",
                         "index": "not_analyzed"
                     }                      
                 }
            }
        }
    }
}'

然后在您的条款面板中,您可以使用 field.untouched ,在计算不同元素的数量时考虑字段的全部内容。

And then in your terms panel you can use field.untouched, to consider the entire content of the field when you calculate the count of the different elements.

如果您不想使用索引模板(可能您的数据在单个索引中),请使用 Put Mapping API 也可以完成这项工作。如果使用多字段,则无需重新索引数据,因为从为索引设置新映射的那一刻起,新数据将在这两个子字段中重复( field_name field_name.untouched )。如果您只是将映射从已分析更改为 not_analyzed ,则在重新编制所有索引之前,您将无法看到任何更改你的数据。

If you don't want to use index templates (maybe your data is in a single index), setting the mapping with the Put Mapping API would do the job too. And if you use multifields, there is no need to reindex the data, because from the moment that you set the new mapping for the index, the new data will be duplicated in these two subfields (field_name and field_name.untouched). If you just change the mapping from analyzed to not_analyzed you won't be able to see any change until you reindex all your data.

这篇关于Logstash + Kibana术语面板不会破坏单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆