管理分布在多台计算机上的大量日志文件 [英] Managing a Large Number of Log Files Distributed Over Many Machines

查看：98 发布时间：2020/5/3 6:38:24 java logging log4j distributed java.util.logging

本文介绍了管理分布在多台计算机上的大量日志文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们已经开始使用第三方平台(GigaSpaces)，该平台可帮助我们进行分布式计算.我们现在要解决的主要问题之一是如何在此分布式环境中管理日志文件.当前，我们有以下设置.

We have started using a third party platform (GigaSpaces) that helps us with distributed computing. One of the major problems we are trying to solve now is how to manage our log files in this distributed environment. We have the following setup currently.

我们的平台分布在8台计算机上.在每台计算机上，我们都有12-15个进程，这些进程使用java.util.logging记录为单独的日志文件.在该平台之上，我们拥有自己的应用程序，这些应用程序使用log4j和log来分隔文件.我们还将stdout重定向到一个单独的文件中，以捕获线程转储和类似内容.

Our platform is distributed over 8 machines. On each machine we have 12-15 processes that log to separate log files using java.util.logging. On top of this platform we have our own applications that use log4j and log to separate files. We also redirect stdout to a separate file to catch thread dumps and similar.

这将导致大约200个不同的日志文件.

This results in about 200 different log files.

到目前为止，我们还没有工具来协助管理这些文件.在以下情况下，这会引起我们严重的头痛.

As of now we have no tooling to assist in managing these files. In the following cases this causes us serious headaches.

当我们事先不知道问题是在哪个过程中发生时进行的故障排除.在这种情况下，我们当前使用ssh登录到每台计算机，并开始使用grep.

通过定期检查日志中是否有异常来尝试变得主动.在这种情况下，我们目前还登录到所有计算机，并使用less和tail查看不同的日志.

Trying to be proactive by regularly checking the logs for anything out of the ordinary. In this case we also currently log in to all machines and look at different logs using less and tail.

设置警报.我们希望为超过阈值的事件设置警报.要检查200个日志文件似乎很麻烦.

Setting up alerts. We are looking to setup alerts on events over a threshold. This is looking to be a pain with 200 log files to check.

今天我们每秒只有大约5个日志事件，但是随着我们将越来越多的代码迁移到新平台，这一事件将会增加.

Today we have only about 5 log events per second, but that will increase as we migrate more and more code to the new platform.

我想问社区以下问题.

您如何处理许多通过不同框架记录的多台机器上分布的许多日志文件的类似案例?
您为什么选择该特定解决方案?
您的解决方案是如何实现的?您发现什么好，发现什么坏了?

非常感谢.

更新

我们最终评估了Splunk的试用版.我们对它的工作方式感到非常满意，并决定购买它.易于设置，快速搜索以及针对技术倾向的大量功能.我可以推荐处于类似情况的任何人进行检查.

We ended up evaluating a trial version of Splunk. We are very happy with how it works and have decided to purchase it. Easy to set up, fast searches and a ton of features for the technically inclined. I can recommend anyone in similar situations to check it out.

管理分布在多台计算机上的大量日志文件 [英] Managing a Large Number of Log Files Distributed Over Many Machines

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

管理分布在多台计算机上的大量日志文件 [英] Managing a Large Number of Log Files Distributed Over Many Machines

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭