如何加快此日志解析器的速度? [英] How to speed up this log parser?

查看:54
本文介绍了如何加快此日志解析器的速度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个千兆字节的日志文件,格式如下:

I have a gigabytes-large log file of in this format:

2016-02-26 08:06:45 Blah blah blah

我有一个日志解析器,可以根据日期将单个文件日志分成多个文件,同时从原始行中修剪日期.

I have a log parser which splits up the single file log into separate files according to date while trimming the date from the original line.

我确实想要某种形式的tee,以便我可以看到整个过程有多远.

I do want some form of tee so that I can see how far along the process is.

问题在于此方法的思维速度很慢.有没有办法在bash中快速做到这一点?还是我必须鞭打一点C程序才能做到这一点?

The problem is that this method is mind numbingly slow. Is there no way to do this quickly in bash? Or will I have to whip up a little C program to do it?

log_file=server.log
log_folder=logs

mkdir $log_folder 2> /dev/null

while read a; do
   date=${a:0:10}

   echo "${a:11}" | tee -a $log_folder/$date
done < <(cat $log_file)

推荐答案

尝试使用该awk解决方案-它应该非常快-它显示进度-仅打开一个文件-还写不以日期开头的行到当前日期文件,这样行不会丢失-如果日志以没有日期的行开头,则默认的初始日期设置为"0000-00-00"

Try this awk solution - it should be pretty fast - it shows progress - only one file is kept open - also writes lines that don't start with a date to the current date file so lines are not lost - a default initial date is set to "0000-00-00" in case log starts with lines without dates

任何时间比较都将不胜感激

any timing comparison would be much appreciated

dir=$1
if [[ -z $dir ]]; then
  echo >&2 "Usage: $0 outdir <logfile"
  echo >&2 "outdir: directory where output files are created"
  echo >&2 "logfile: input on stdin to split into output files"
  exit 1
fi
mkdir -p $dir
echo "output directory \"$dir\""
awk -vdir=$dir '
BEGIN {
  datepat="[0-9]{4}-[0-9]{2}-[0-9]{2}"
  date="0000-00-00"
  file=dir"/"date
}
date != $1 && $1 ~ datepat {
  if(file) {
    close(file)
    print ""
  }
  print $1 ":"
  date=$1
  file=dir"/"date
}
{
  if($1 ~ datepat)
    line=substr($0,12)
  else
    line=$0
  print line
  print line >file
}
'
head -6 $dir/*

样本输入日志

first line without date
2016-02-26 08:06:45 0 Blah blah blah
2016-02-26 09:06:45 1 Blah blah blah
2016-02-27 07:06:45 2 Blah blah blah
2016-02-27 08:06:45 3 Blah blah blah
no date line
blank lines

another no date line
2016-02-28 07:06:45 4 Blah blah blah
2016-02-28 08:06:45 5 Blah blah blah

输出

first line without date

2016-02-26:
08:06:45 0 Blah blah blah
09:06:45 1 Blah blah blah

2016-02-27:
07:06:45 2 Blah blah blah
08:06:45 3 Blah blah blah
no date line
blank lines

another no date line

2016-02-28:
07:06:45 4 Blah blah blah
08:06:45 5 Blah blah blah

==> tmpd/0000-00-00 <==
first line without date

==> tmpd/2016-02-26 <==
08:06:45 0 Blah blah blah
09:06:45 1 Blah blah blah

==> tmpd/2016-02-27 <==
07:06:45 2 Blah blah blah
08:06:45 3 Blah blah blah
no date line
blank lines

another no date line

==> tmpd/2016-02-28 <==
07:06:45 4 Blah blah blah
08:06:45 5 Blah blah blah

这篇关于如何加快此日志解析器的速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆