存储许多日志文件 [英] Storage of many log files

查看:91
本文介绍了存储许多日志文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个系统,通过http(> 10k生产者,每天10日志,每个〜100行文本)从不同的地方接收日志文件。



我想存储它们以便能够计算misc。我们的问题是,什么是最好的方式来存储它们?

p>


  • 平面文本文件(正确锁定),每个上传文件一个文件,每天/制作者一个目录

  • 平面文本文件,所有生产者每天一个(大)文件

  • 数据库文本表(MySQL是内部原因首选)

  • 每个文本行一个记录的数据库表

  • 具有分片的数据库(每天一个表) ),允许简单的数据清除。 (这是分区,但是我可以访问的mysql版本(即内部支持)不支持它)

  • 基于文档的DBàla couchdb或mongodb(问题可能是索引/成熟度/摄取速度)



任何建议?

解决方案

我会选择第一个解决方案。



我不明白为什么你需要DB。看起来像你需要的是扫描数据。保持日志在最原始状态,然后处理它,然后创建一个tarball每一天。



聚合的唯一原因是减少文件数。在某些文件系统上,如果将超过N个文件放在一个目录中,性能会迅速下降。检查您的文件系统,如果是这样,组织一个简单的2级层次结构,例如,使用生产者ID的前2位数作为第一级目录名称。


I have a system which is receiving log files from different places through http (>10k producers, 10 logs per day, ~100 lines of text each).

I would like to store them to be able to compute misc. statistics over them nightly , export them (ordered by date of arrival or first line content) ...

My question is : what's the best way to store them ?

  • Flat text files (with proper locking), one file per uploaded file, one directory per day/producer
  • Flat text files, one (big) file per day for all producers (problem here will be indexing and locking)
  • Database Table with text (MySQL is preferred for internal reasons) (pb with DB purge as delete can be very long !)
  • Database Table with one record per line of text
  • Database with sharding (one table per day), allowing simple data purge. (this is partitioning. However the version of mysql I have access to (ie supported internally) does not support it)
  • Document based DB à la couchdb or mongodb (problem could be with indexing / maturity / speed of ingestion)

Any advice ?

解决方案

I'd pick the very first solution.

I don't see why would you need DB at all. Seems like all you need is to scan through the data. Keep the logs in the most "raw" state, then process it and then create a tarball for each day.

The only reason to aggregate would be to reduce the number of files. On some file systems, if you put more than N files in a directory, the performance decreases rapidly. Check your filesystem and if it's the case, organize a simple 2-level hierarchy, say, using the first 2 digits of producer ID as the first level directory name.

这篇关于存储许多日志文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆