解析apache日志文件 [英] Parsing apache log files
问题描述
来自文件的行
lockquote
172.16.0.3 - [25 / Sep / 2002:14:04:19 +0200]GET / HTTP / 1.1401 - Mozilla / 5.0(X11; U; Linux i686; zh-cn; rv:1.1)Gecko / 20020827
根据 Apache网站格式为
%h%l%u%t \%r \%> s%b \%{Referer} i\\%{User-Agent} i \
我可以打开文件并按原样读取,但是我不知道如何以这种格式读取我可以把每个部分放在一个列表中。
这是 例如: 输出将是一个包含6行信息的元组(特别是该模式中括号内的组): I just started learning Python and would like to read an Apache log file and put parts of each line into different lists. line from the file 172.16.0.3 - - [25/Sep/2002:14:04:19 +0200] "GET / HTTP/1.1" 401 - "" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020827" according to Apache website the format is %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\ I'm able to open the file and just read it as it is but I don't know how to make it read in that format so I can put each part in a list. This is a job for regular expressions. For example: The output would be a tuple with 6 pieces of information from the line (specifically, the groups within parentheses in that pattern):
这篇关于解析apache日志文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
l =172.16.0.3 - - [25 / Sep / 2002:14:04:19 +0200]GET / HTTP / 1.1401 - Mozilla / 5.0(X11; U; Linux i686; EN-US; rv:1.1)Gecko / 20020827'
regex ='([(\d\。)] +) - - \ [(。*?)\](。*?)( \ d +) - (。*?)(。*?)'
import re
print re.match(regex,line).groups()
('172.16.0.3','25 / Sep / 2002:14:04:19 +0200',' GET / HTTP / 1.1','401','','Mozilla / 5.0(X11; U; Linux i686; en-US; rv:1.1)Gecko / 20020827')
line = '172.16.0.3 - - [25/Sep/2002:14:04:19 +0200] "GET / HTTP/1.1" 401 - "" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020827"'
regex = '([(\d\.)]+) - - \[(.*?)\] "(.*?)" (\d+) - "(.*?)" "(.*?)"'
import re
print re.match(regex, line).groups()
('172.16.0.3', '25/Sep/2002:14:04:19 +0200', 'GET / HTTP/1.1', '401', '', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020827')