快速处理Apache日志 [英] Processing apache logs quickly

查看：155 发布时间：2016/5/19 23:18:23 apache awk large-data-volumes

本文介绍了快速处理Apache日志的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前正在运行的awk脚本处理大量（8.1GB）访问日志文件，它的服用永远结束。在20分钟内，就写了14MB的（1000 + - 500）MB我希望它写的，我不知道如果我能更快地以某种方式处理它

I'm currently running an awk script to process a large (8.1GB) access-log file, and it's taking forever to finish. In 20 minutes, it wrote 14MB of the (1000 +- 500)MB I expect it to write, and I wonder if I can process it much faster somehow.

下面是awk脚本：


#!/bin/bash

awk '{t=$4" "$5; gsub("[\[\]\/]"," ",t); sub(":"," ",t);printf("%s,",$1);system("date -d \""t"\" +%s");}' $1

的编辑：的

对于非awkers，该脚本读取每一行，获取最新信息，将它修改为一个格式工具日期识别和调用它重新present日期为1970年以来的秒数，最后返回它作为一个行的.csv文件，用IP一起。

For non-awkers, the script reads each line, gets the date information, modifies it to a format the utility date recognizes and calls it to represent the date as the number of seconds since 1970, finally returning it as a line of a .csv file, along with the IP.

例输入： 189.5.56.113 - - [22/1月/ 2010：05：54：55 +0100]GET（...）

Example input: 189.5.56.113 - - [22/Jan/2010:05:54:55 +0100] "GET (...)"

返回的输出： 189.5.56.113,124237889

Returned output: 189.5.56.113,124237889

推荐答案

@OP，你的脚本主要是慢，由于系统日期的命令对文件中的每一行，它的一个大的文件，以及过度调用（在在GB）。如果你有GAWK，使用其内部mktime（）根据命令执行日期划时代秒转换

@OP, your script is slow mainly due to the excessive call of system date command for every line in the file, and its a big file as well (in the GB). If you have gawk, use its internal mktime() command to do the date to epoch seconds conversion

awk 'BEGIN{
   m=split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",d,"|")
   for(o=1;o<=m;o++){
      date[d[o]]=sprintf("%02d",o)
    }
}
{
    gsub(/\[/,"",$4); gsub(":","/",$4); gsub(/\]/,"",$5)
    n=split($4, DATE,"/")
    day=DATE[1]
    mth=DATE[2]
    year=DATE[3]
    hr=DATE[4]
    min=DATE[5]
    sec=DATE[6]
    MKTIME= mktime(year" "date[mth]" "day" "hr" "min" "sec)
    print $1,MKTIME

}' file

输出

$ more file
189.5.56.113 - - [22/Jan/2010:05:54:55 +0100] "GET (...)"
$ ./shell.sh    
189.5.56.113 1264110895

这篇关于快速处理Apache日志的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

快速处理Apache日志 [英] Processing apache logs quickly

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

快速处理Apache日志 [英] Processing apache logs quickly

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭