通过使用命令行工具日期分割access.log文件 [英] Split access.log file by dates using command line tools
问题描述
我有一个Apache的access.log文件,这大约是35GB的大小..通过它Grepping是不是一种选择更多,而无需等待很大。
I have a Apache access.log file, which is around 35GB in size.. Grepping through it is not an option any more, without waiting a great deal.
我想用日期作为分裂准则,以它在许多小文件分割。
I wanted to split it in many small files, by using date as splitting criteria.
日期是格式为[15 /月/ 2011:12:02:02 +0000]。任何想法,我怎么能做到这一点只使用bash脚本编程,标准文本处理程序(的grep,awk中,sed和喜欢),管道和重定向?
Date is in format "[15/Oct/2011:12:02:02 +0000]". Any idea how could I do it using only bash scripting, standard text manipulation programs (grep, awk, sed, and likes), piping and redirection?
输入文件名是access.log的。我想输出文件有格式,如access.apache.15_Oct_2011.log(即会做的伎俩,虽然不是很好排序时..)
Input file name is access.log. I'd like output files to have format such as access.apache.15_Oct_2011.log (that would do the trick, although not nice when sorting..)
推荐答案
使用的一种方法 AWK
:
awk 'BEGIN {
split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ", months, " ")
for (a = 1; a <= 12; a++)
m[months[a]] = a
}
{
split($4,array,"[:/]");
year = array[3]
month = sprintf("%02d", m[array[2]])
print > FILENAME"-"year"_"month".txt"
}' incendiary.ws-2009
这将输出文件如:
incendiary.ws-2010-2010_04.txt
incendiary.ws-2010-2010_05.txt
incendiary.ws-2010-2010_06.txt
incendiary.ws-2010-2010_07.txt
针对一个150 MB的日志文件,通过chepner答案就增加了3.4 GHz的8核Xeon E3127070秒,而这种方法用了5秒。
Against a 150 MB log file, the answer by chepner took 70 seconds on an 3.4 GHz 8 Core Xeon E31270, while this method took 5 seconds.
最初灵感:如何通过一个月拆分现有的Apache日志文件?
这篇关于通过使用命令行工具日期分割access.log文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!