通过使用命令行工具日期分割access.log文件 [英] Split access.log file by dates using command line tools

查看:551
本文介绍了通过使用命令行工具日期分割access.log文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Apache的access.log文件,这大约是35GB的大小..通过它Grepping是不是一种选择更多,而无需等待很大。

I have a Apache access.log file, which is around 35GB in size.. Grepping through it is not an option any more, without waiting a great deal.

我想用日期作为分裂准则,以它在许多小文件分割。

I wanted to split it in many small files, by using date as splitting criteria.

日期是格式为[15 /月/ 2011:12:02:02 +0000]。任何想法,我怎么能做到这一点只使用bash脚本编程,标准文本处理程序(的grep,awk中,sed和喜欢),管道和重定向?

Date is in format "[15/Oct/2011:12:02:02 +0000]". Any idea how could I do it using only bash scripting, standard text manipulation programs (grep, awk, sed, and likes), piping and redirection?

输入文件名是access.log的。我想输出文件有格式,如access.apache.15_Oct_2011.log(即会做的伎俩,虽然不是很好排序时..)

Input file name is access.log. I'd like output files to have format such as access.apache.15_Oct_2011.log (that would do the trick, although not nice when sorting..)

推荐答案

使用的一种方法 AWK

awk 'BEGIN {
    split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ", months, " ")
    for (a = 1; a <= 12; a++)
        m[months[a]] = a
}
{
    split($4,array,"[:/]");
    year = array[3]
    month = sprintf("%02d", m[array[2]])

    print > FILENAME"-"year"_"month".txt"
}' incendiary.ws-2009

这将输出文件如:

incendiary.ws-2010-2010_04.txt
incendiary.ws-2010-2010_05.txt
incendiary.ws-2010-2010_06.txt
incendiary.ws-2010-2010_07.txt

针对一个150 MB的日志文件,通过chepner答案就增加了3.4 GHz的8核Xeon E3127070秒,而这种方法用了5秒。

Against a 150 MB log file, the answer by chepner took 70 seconds on an 3.4 GHz 8 Core Xeon E31270, while this method took 5 seconds.

最初灵感:如何通过一个月拆分现有的Apache日志文件

这篇关于通过使用命令行工具日期分割access.log文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆