解析Apache日志在Perl [英] Parsing Apache logs in Perl
问题描述
已更新2013年5月10日
好了,现在我可以没有问题过滤掉的IP地址。现在想来接下来的三件事情,我想这样做,我认为可以很容易地进行排序($键)
,但是我错了,然后试图稍微复杂下面的做法似乎并没有成为解决无论是。我需要完成接下来的事情就是收集日期,和浏览器版本。我会提供我的日志文件格式的样品和我目前的code。
Apache日志
24.235.131.196 - - [10 /三月/ 2004:00:57:48 -0500]GET http://www.google.com/iframe.php HTTP / 1.0 500 414http://www.google.com/iframe.php的Mozilla / 4.0(兼容; MSIE 6.0; Windows 98中)
我的code
#!USR /斌/ perl的-w
使用严格的;我见过%=();
开(FILE,< ACCESS_LOG)或死无法打开文件$!;而(我的$行=<文件>){
的Chomp $线; #的正则表达式的IP地址。
如果($行=〜/(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})/){
$看到{$ 1} ++;
} #regex日期的例子是[09 \\月\\ 2009:05:30:23]
如果($行=〜/\\[[\\d]{2}\\\\.*[\\d]{4}\\:[\\d]{2}\\:[\\d]{2}\\]*/) {
打印\\ n \\ n $的行匹配:$ _ \\ n;
}}
关闭文件;
我的$ I = 0;#程序错误,如果我取消注释以下行,
#但我的理解这基本上是我想要做的。
#我的$键(按键%可见)(键%日期){
我的$键(按键%可见){
我($ IP)=排序{$一个CMP $ B}($键);
#我也希望能够在IP地址排序和
#我这样做才会生成错误说法的内容不是数字的正确数字的方式。
打印@ $&IP的GT; [$ i]。 \\ n;
#打印的IPv4地址是:$关键,并已访问的服务器$看到{$ key}的时间\\ n;
$ I ++;
}
您是pretty接近。是的,我会用散
。它通常称为看到的散列
#!USR /斌/ perl的使用警告;
使用严格的;我的$日志=web.log;
我见过%=();开放式(我$跳频,<,$日志)或死亡:;无法打开$日志$!而(我的$行=< $ FH>){
的Chomp $线; 如果($行=〜/(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})/){
$看到{$ 1} ++;
}
}
关闭$ FH;我的$键(按键%可见){
打印键$:$看到{$关键} \\ n;
}
下面是一些输出日志文件示例:
$猫web.log
[周一9月21日2时35分24秒1999]一些味精等等等等
[周一9月21日2时35分24秒1999] 192.1.1.1
[周一9月21日2时35分24秒1999] 1.1.1.1
[周一9月21日2时35分24秒1999] 10.1.1.9
[周一9月21日2时35分24秒1999] 192.1.1.1
[周一9月21日2时35分24秒1999] 10.1.1.5
[周一9月21日2时35分24秒1999] 10.1.1.9
[周一9月21日2时35分24秒1999] 192.1.1.1
$ test.pl
1.1.1.1:1
192.1.1.1:3
10.1.1.9:2
10.1.1.5:1
的有几件事情,我会小心的:的
我@array =< FH取代;
这将拉动整个文件到内存中,这是不是一个好主意。特别是在这种情况下,日志文件,他们可以成长pretty大。更是这样,如果不是旋转
正常。 为
或的foreach
将有同样的问题。 ,而
是从文件读取的最佳实践。
您应该在使用3 - 精氨酸词法范围打开
如我上面例子的习惯。
您死亡
语句不应该这样precise。看我为消息死去
。由于原因可能是权限不存在,锁定等...
更新
这会为你的工作的日期。
我的$行=[09 \\月\\ 2009:05:30:23]:加上一些消息;#示例是[09 \\月\\ 2009:05:30:23]
如果($行=〜/(\\[[\\d]{2}\\\\.*\\\\[\\d]{4}:[\\d]{2}:[\\d]{2}:[\\d ] {2} \\])/){
打印$行匹配:$ 1 \\ N的;
}
UPDATE2
有是你做错了一些事情。
我没有看到你存储的东西为日期散
。
打印\\ n \\ n $的行匹配:$ _ \\ n;
应该看起来像你的可见散
,这并不什么太大的意义。什么是你想用这个存储最新的数据呢?
$ {数据$ 1} =一定的价值,这是给你;
您不能遍历一个两个
循环。 哈希
为
我的$ foo的(关键%H)(键%H2){#做的东西}
和最后的排序位,你应该只排序
的键
我的$键(排序键%可见){
UPDATED 5-10-2013
Okay, so now I can filter out the IP addresses with no problem. Now come the next three things I'd like to do which I thought could easily be done with sort($keys)
, but I was wrong and then trying a slightly more complex approach below didn't seem to be the solution either. The next thing I need to accomplish is gathering dates, and browser version. I will provide a sample of the formatting of my log files and my current code.
APACHE LOG
24.235.131.196 - - [10/Mar/2004:00:57:48 -0500] "GET http://www.google.com/iframe.php HTTP/1.0" 500 414 "http://www.google.com/iframe.php" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
My Code
#!usr/bin/perl -w
use strict;
my %seen = ();
open(FILE, "< access_log") or die "unable to open file $!";
while( my $line = <FILE>) {
chomp $line;
# regex for ip address.
if( $line =~ /(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/ ) {
$seen{$1}++;
}
#regex for date an example is [09\Mar\2009:05:30:23]
if( $line =~ /\[[\d]{2}\\.*[\d]{4}\:[\d]{2}\:[\d]{2}\]*/) {
print "\n\n $line matched : $_\n";
}
}
close FILE;
my $i = 0;
# program bugs out if I uncomment the below line,
# but to my understanding this is essentially what I'm trying to do.
# for my $key ( keys %seen ) (keys %date) {
for my $key ( keys %seen ) {
my ($ip) = sort {$a cmp $b}($key);
# also I'd like to be able to sort the IP addresses and if
# I do it the proper numeric way it generates errors saying contents are not numeric.
print @$ip->[$i] . "\n";
# print "The IPv4 address is : $key and has accessed the server $seen{$key} times. \n";
$i++;
}
You're pretty close. And yes, I would use a hash
. It's commonly called a "seen hash".
#!usr/bin/perl
use warnings;
use strict;
my $log = "web.log";
my %seen = ();
open (my $fh, "<", $log) or die "unable to open $log: $!";
while( my $line = <$fh> ) {
chomp $line;
if( $line =~ /(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/ ){
$seen{$1}++;
}
}
close $fh;
for my $key ( keys %seen ) {
print "$key: $seen{$key}\n";
}
Here's a sample log file with some output:
$ cat web.log
[Mon Sep 21 02:35:24 1999] some msg blah blah
[Mon Sep 21 02:35:24 1999] 192.1.1.1
[Mon Sep 21 02:35:24 1999] 1.1.1.1
[Mon Sep 21 02:35:24 1999] 10.1.1.9
[Mon Sep 21 02:35:24 1999] 192.1.1.1
[Mon Sep 21 02:35:24 1999] 10.1.1.5
[Mon Sep 21 02:35:24 1999] 10.1.1.9
[Mon Sep 21 02:35:24 1999] 192.1.1.1
$ test.pl
1.1.1.1: 1
192.1.1.1: 3
10.1.1.9: 2
10.1.1.5: 1
A few things I would be careful of:
my @array = <FH>;
this will pull the entire file into memory, which isn't a great idea. Especially in this case for log files, they can grow pretty large. Even more so if not rotated
properly. for
or foreach
will have this same problem. while
being best practice for reading from a file.
You should be in the habit of using the 3-arg lexically scoped open
as in my example above.
Your die
statement shouldn't be so "precise". See my message for die
. Since the reason could be permissions, doesn't exist, locked, etc...
UPDATE
This will work for your dates.
my $line = '[09\Mar\2009:05:30:23]: plus some message';
#example is [09\Mar\2009:05:30:23]
if( $line =~ /(\[[\d]{2}\\.*\\[\d]{4}:[\d]{2}:[\d]{2}:[\d]{2}\])/ ){
print "$line matched: $1\n";
}
UPDATE2
There's a few things you've done wrong.
I don't see you storing stuff into a date hash
.
print "\n\n $line matched : $_\n";
Should look like your seen hash
, which doesn't make too much sense. What are you trying to do with this stored date data?
$data{$1} = "some value, which is up to you";
You cannot loop over two hashes
in one for
loop.
for my $foo (keys %h)(keys %h2) { # do stuff }
And for the last sorting bit, you should just sort
the keys
for my $key (sort keys %seen ) {
这篇关于解析Apache日志在Perl的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!