解析Apache日志在Perl [英] Parsing Apache logs in Perl

查看:195
本文介绍了解析Apache日志在Perl的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

已更新2013年5月10日

好了,现在我可以没有问题过滤掉的IP地址。现在想来接下来的三件事情,我想这样做,我认为可以很容易地进行排序($键),但是我错了,然后试图稍微复杂下面的做法似乎并没有成为解决无论是。我需要完成接下来的事情就是收集日期,和浏览器版本。我会提供我的日志文件格式的样品和我目前的code。

Apache日志

  24.235.131.196  -   -  [10 /三月/ 2004:00:57:48 -0500]GET http://www.google.com/iframe.php HTTP / 1.0 500 414http://www.google.com/iframe.php的Mozilla / 4.0(兼容; MSIE 6.0; Windows 98中)

我的code

 #!USR /斌/ perl的-w
使用严格的;我见过%=();
开(FILE,< ACCESS_LOG)或死无法打开文件$!;而(我的$行=<文件>){
    的Chomp $线;    #的正则表达式的IP地址。
    如果($行=〜/(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})/){
        $看到{$ 1} ++;
    }    #regex日期的例子是[09 \\月\\ 2009:05:30:23]
    如果($行=〜/\\[[\\d]{2}\\\\.*[\\d]{4}\\:[\\d]{2}\\:[\\d]{2}\\]*/) {
        打印\\ n \\ n $的行匹配:$ _ \\ n;
    }}
关闭文件;
我的$ I = 0;#程序错误,如果我取消注释以下行,
#但我的理解这基本上是我想要做的。
#我的$键(按键%可见)(键%日期){
我的$键(按键%可见){
    我($ IP)=排序{$一个CMP $ B}($键);
    #我也希望能够在IP地址排序和
    #我这样做才会生成错误说法的内容不是数字的正确数字的方式。
    打印@ $&IP的GT; [$ i]。 \\ n;
    #打印的IPv4地址是:$关键,并已访问的服务器$看到{$ key}的时间\\ n;
    $ I ++;
}


解决方案

您是pretty接近。是的,我会用。它通常称为看到的散列

 #!USR /斌/ perl的使用警告;
使用严格的;我的$日志=web.log;
我见过%=();开放式(我$跳频,<,$日志)或死亡:;无法打开$日志$!而(我的$行=< $ FH>){
    的Chomp $线;    如果($行=〜/(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})/){
        $看到{$ 1} ++;
    }
}
关闭$ FH;我的$键(按键%可见){
    打印键$:$看到{$关键} \\ n;
}

下面是一些输出日志文件示例:

  $猫web.log
[周一9月21日2时35分24秒1999]一些味精等等等等
[周一9月21日2时35分24秒1999] 192.1.1.1
[周一9月21日2时35分24秒1999] 1.1.1.1
[周一9月21日2时35分24秒1999] 10.1.1.9
[周一9月21日2时35分24秒1999] 192.1.1.1
[周一9月21日2时35分24秒1999] 10.1.1.5
[周一9月21日2时35分24秒1999] 10.1.1.9
[周一9月21日2时35分24秒1999] 192.1.1.1
$ test.pl
1.1.1.1:1
192.1.1.1:3
10.1.1.9:2
10.1.1.5:1

有几件事情,我会小心的:

我@array =< FH取代; 这将拉动整个文件到内存中,这是不是一个好主意。特别是在这种情况下,日志文件,他们可以成长pretty大。更是这样,如果不是旋转正常。 的foreach 将有同样的问题。 ,而是从文件读取的最佳实践。

您应该在使用3 - 精氨酸词法范围打开如我上面例子的习惯。

死亡语句不应该这样precise。看我为消息死去。由于原因可能是权限不存在,锁定等...

更新

这会为你的工作的日期。

 我的$行=[09 \\月\\ 2009:05:30:23]:加上一些消息;#示例是[09 \\月\\ 2009:05:30:23]
如果($行=〜/(\\[[\\d]{2}\\\\.*\\\\[\\d]{4}:[\\d]{2}:[\\d]{2}:[\\d ] {2} \\])/){
   打印$行匹配:$ 1 \\ N的;
}

UPDATE2

有是你做错了一些事情。

我没有看到你存储的东西为日期

 打印\\ n \\ n $的行匹配:$ _ \\ n;

应该看起来像你的可见散,这并不什么太大的意义。什么是你想用这个存储最新的数据呢?

  $ {数据$ 1} =一定的价值,这是给你;

您不能遍历一个两个哈希循环。

 我的$ foo的(关键%H)(键%H2){#做的东西}

和最后的排序位,你应该只排序

 我的$键(排序键%可见){

UPDATED 5-10-2013

Okay, so now I can filter out the IP addresses with no problem. Now come the next three things I'd like to do which I thought could easily be done with sort($keys), but I was wrong and then trying a slightly more complex approach below didn't seem to be the solution either. The next thing I need to accomplish is gathering dates, and browser version. I will provide a sample of the formatting of my log files and my current code.

APACHE LOG

24.235.131.196 - - [10/Mar/2004:00:57:48 -0500] "GET http://www.google.com/iframe.php HTTP/1.0" 500 414 "http://www.google.com/iframe.php" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"

My Code

#!usr/bin/perl -w
use strict;

my %seen = ();
open(FILE, "< access_log") or die "unable to open file  $!";    

while( my $line = <FILE>) {
    chomp $line;

    # regex for ip address.
    if( $line =~ /(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/ ) {  
        $seen{$1}++;
    }

    #regex for date an example is [09\Mar\2009:05:30:23]
    if( $line =~ /\[[\d]{2}\\.*[\d]{4}\:[\d]{2}\:[\d]{2}\]*/) {
        print "\n\n $line matched : $_\n";
    }

}
close FILE;
my $i = 0;

# program bugs out if I uncomment the below line, 
# but to my understanding this is essentially what I'm trying to do.
# for my $key ( keys %seen ) (keys %date) {
for my $key ( keys %seen ) {
    my ($ip) = sort {$a cmp $b}($key); 
    # also I'd like to be able to sort the IP addresses and if 
    # I do it the proper numeric way it generates errors saying contents are not numeric. 
    print @$ip->[$i] . "\n";
    # print "The IPv4 address is : $key and has accessed the server $seen{$key} times. \n";
    $i++;
}

解决方案

You're pretty close. And yes, I would use a hash. It's commonly called a "seen hash".

#!usr/bin/perl 

use warnings;
use strict;

my $log = "web.log";
my %seen = ();

open (my $fh, "<", $log) or die "unable to open $log: $!"; 

while( my $line = <$fh> ) {
    chomp $line;

    if( $line =~ /(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/ ){
        $seen{$1}++;
    }
}
close $fh;

for my $key ( keys %seen ) {
    print "$key: $seen{$key}\n";
}

Here's a sample log file with some output:

$ cat web.log 
[Mon Sep 21 02:35:24 1999] some msg blah blah
[Mon Sep 21 02:35:24 1999] 192.1.1.1
[Mon Sep 21 02:35:24 1999] 1.1.1.1
[Mon Sep 21 02:35:24 1999] 10.1.1.9
[Mon Sep 21 02:35:24 1999] 192.1.1.1
[Mon Sep 21 02:35:24 1999] 10.1.1.5
[Mon Sep 21 02:35:24 1999] 10.1.1.9
[Mon Sep 21 02:35:24 1999] 192.1.1.1
$ test.pl
1.1.1.1: 1
192.1.1.1: 3
10.1.1.9: 2
10.1.1.5: 1

A few things I would be careful of:

my @array = <FH>; this will pull the entire file into memory, which isn't a great idea. Especially in this case for log files, they can grow pretty large. Even more so if not rotated properly. for or foreach will have this same problem. while being best practice for reading from a file.

You should be in the habit of using the 3-arg lexically scoped open as in my example above.

Your die statement shouldn't be so "precise". See my message for die. Since the reason could be permissions, doesn't exist, locked, etc...

UPDATE

This will work for your dates.

my $line = '[09\Mar\2009:05:30:23]: plus some message';

#example is [09\Mar\2009:05:30:23]
if( $line =~ /(\[[\d]{2}\\.*\\[\d]{4}:[\d]{2}:[\d]{2}:[\d]{2}\])/ ){
   print "$line matched: $1\n"; 
}

UPDATE2

There's a few things you've done wrong.

I don't see you storing stuff into a date hash.

print "\n\n $line matched : $_\n";

Should look like your seen hash, which doesn't make too much sense. What are you trying to do with this stored date data?

$data{$1} = "some value, which is up to you";

You cannot loop over two hashes in one for loop.

for my $foo (keys %h)(keys %h2) { # do stuff }

And for the last sorting bit, you should just sort the keys

for my $key (sort keys %seen ) {

这篇关于解析Apache日志在Perl的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆