使用perl计算日志文件中变量组合的数量 [英] Count the number of variable combinations in a logfile using perl

查看:85
本文介绍了使用perl计算日志文件中变量组合的数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个日志文件

 New connection: 141.8.83.213:64400 (172.17.0.6:2222) [session: e696835c]
    2016-04-29 21:13:59+0000 [SSHService ssh-userauth on HoneyPotTransport,3,141.8.83.213] login attempt [user1/test123] failed
    2016-04-29 21:14:10+0000 [SSHService ssh-userauth on HoneyPotTransport,3,141.8.83.213] login attempt [user1/test1234] failed
    2016-04-29 21:14:13+0000 [SSHService ssh-userauth on HoneyPotTransport,3,141.8.83.213] login attempt [user1/test123] failed

我要输出以提交这样的结果:

I want to output to file a result like this:

Port,Status,Occurrences
64400,failed,2
64400,failed,1

"Occurrences"变量将代表文件中记录的登录详细信息[用户名和密码]组合的次数.可以看到User1 test123从同一IP记录了两次.我怎样才能做到这一点?目前,我有两个while循环,并且在第一个while循环中调用了一个子例程,如下所示:

The "Occurrences" variable will represent the number of times a combination of login details[username and password] that have been recorded in the file. User1 test123 can be seen recorded two times from the same IP. How can I do this? I have two while loops at the moment and a subroutine being called inside the first while loop like so:

子例程

sub counter(){

        $result = 0;
        #open(FILE2, $cowrie) or die "Can't open '$cowrie': $!";
        while(my $otherlines = <LOG2>){

                if($otherlines =~ /login attempt/){
                        ($user, $password) = (split /[\s:\[\]\/]+/, $otherlines)[-3,-2];
                        if($_[1] =~ /$user/ && $_[2] =~ /$password/){
                                $result++;
                        }#if ip matches i think i have to do this with split

                        #print "TEST\n";
                }
        #print "Combo $_[0] and $_[1]\n";

        }
        #print "$result";
        return $result;
}

主要方法

sub cowrieExtractor(){

        open(FILE2, $cowrie) or die "Can't open '$cowrie': $!";

        open(LOG2, $path2) or die "Can't open '$path2': $!";

        $seperator = chr(42);
        #To output user and password of login attempt, set $ip variable to the contents of array at that x position of new
        #connection to match the ip of the login attempt
        print FILE2 "SourcePort"."$seperator".
        "Status"."$seperator"."Occurences"."$seperator"."Malicious"."\n";

        $ip = "";
        $port = "";
        $usr = "";
        $pass = "";
        $status = "";
        $frequency = 0;

        #Given this is a user/pass attempt honeypot logger, I will use a wide character to reduce the possibility of stopping
        #the WEKA CSV loader from functioning by using smileyface as seperators.


        while(my $lines = <LOG2>){

                if($lines =~ /New connection/){

                ($ip, $port) = (split /[\[\]\s:()]+/, $lines)[7,8];

                }
                if($lines =~ /login attempt/){#and the ip of the new connection
if($lines =~ /$ip/){
                ($usr, $pass, $status) = (split /[\s:\[\]\/]+/, $lines)[-3,-2,-1];

                        $frequency = counter($ip, $usr, $pass);

                        #print $frequency;
                        if($ip && $port && $usr && $pass && $status ne ""){
                                print FILE2 join "$seperator",($port, $status, $frequency, $end);
                                print FILE2 "\n";
                        }
                }


                }
        }


}

现在在输出的Occurrences下的输出中,我得到一个0,并且在测试时,它似乎来自于我在子例程中初始化变量$result的源.即0;这意味着子例程中的if语句无法正常工作.有帮助吗?

Right now in output under Occurrences in output I am getting a 0 and when I tested it appears to be coming from what I initialize the variable $result in the subroutine. i.e. 0; meaning that the if statement inside the subroutine is not working properly. Any help?

推荐答案

这是获取预期输出的基本方法.有关上下文(目的)的问题仍然存在.

Here is a basic way to get expected output. Questions about the context (purpose) remain.

use warnings;
use strict;

my $file = 'logfile.txt';
open my $fh_in, '<', $file;

# Assemble results for required output in data structure:
# %rept = { $port => { $usr => { $status => $freq } };

my %rept;
my ($ip, $port);

while (my $line = <$fh_in>) 
{
    if ($line =~ /New connection/) {
        ($ip, $port) = $line =~ /New connection:\s+([^:]+):(\d+)/;
        next;
    }   

    my ($usr, $status) =  $line =~ m/login\ attempt \s+ \[ ( [^\]]+ ) \] \s+ (\w+)/x;
    if ($usr and $status) {
        $rept{$port}{$usr}{$status}++;
    }   
    else { warn "Line with an unexpected format:\n$line" }
}

# use Data::Dumper;
# print Dumper \%rept;

print "Port,Status,Occurences\n";
foreach my $port (sort keys %rept) {
    foreach my $usr (sort keys %{$rept{$port}}) {
        foreach my $stat ( sort keys %{$rept{$port}{$usr}} ) { 
            print "$port,$stat,$rept{$port}{$usr}{$stat}\n"; 
        }   
    }   

}

将您的输入复制到文件logfile.txt中,此打印结果

With your input copied into a file logfile.txt this prints


Port,Status,Occurences
64400,failed,2
64400,failed,1

我将整个user1/test123(等)用于识别用户.可以根据需要在正则表达式中进行更改. 请注意,这将不允许您以非常不同的方式查询或组织数据,它主要提取所需输出的内容.请让我知道是否需要解释.

I take the whole user1/test123 (etc) to identify the user. This can be changed in the regex as needed. Note that this will not allow you to query or organize data very differently, it mostly pulls what is needed for the required output. Please let me know if explanations are needed.

上面使用的嵌套哈希的介绍性解释

首先,我强烈建议您很好地阅读许多可用材料中的一些. 一个好的开始肯定是 Perl上的标准教程 引用,以及各种食谱 在 Perl数据结构上.

First, I strongly recommend a good reading of some of the many materials available. A good start is surely the standard tutorial on Perl references, as well as a cookbook of sorts on Perl data structures.

用于收集数据的散列具有作为端口号的键,并且每个键都有 对于其值,请使用哈希引用(或者更确切地说,是匿名哈希).这些每个 哈希具有作为用户的键,对于它们的值,键又具有哈希引用. 这些键是状态的可能值,因此有两个键(失败 并成功).它们的值是频率.这种嵌套"是一个复合体 数据结构.还有另一件事.第一次声明 $rept{$port}{$usr}{$status}++可以看到整个层次结构已创建.所以关键 $port不需要预先存在.重要的是,此自动活体化 即使仅查询结构的值也会发生(除非它实际存在) 已经).

The hash used to collect data has keys which are port numbers, and each of them has for its value a hash reference (or, rather, an anonymous hash). Each of these hashes has keys which are users, which for their values have, again, hash references. The keys for these are the possible values of status, so there are two keys (failed and succeded). Their values are frequencies. This kind of 'nesting' is a complex data structure. There is another important thing. The first time the statement $rept{$port}{$usr}{$status}++ is seen the whole hierarchy is created. So the key $port did not need to exist beforehand. Importantly, this auto vivification happens even if a structure is merely queried for values (unless it actually exists already).

第一次迭代后,哈希为

%rept = { '64400' => { 'user1/test123' => { 'failed' => 1 } } }

在第二次迭代中,可以看到相同的端口,但有一个新用户,因此将新数据添加到第二级匿名哈希中.使用status => count创建具有新用户的密钥,其值是(新)匿名哈希.整个哈希为:

In the second iteration the same port is seen but a new user, so new data is added to the second-level anonymous hash. The key with the new user is created, with its value being a (new) anonymous hash, with status => count. The whole hash is:

%rept = { 
    '64400' => { 
        'user1/test123'  => { 'failed' => 1 },
        'user1/test1234' => { 'failed' => 1 },
    } 
}

在下一次迭代中,将看到相同的端口,并且是一个已经存在的用户,并且 因为它的状态(失败)也存在.因此,计数是 状态会增加.

In the next iteration the same port is seen and one of already existing users, and as it happens with the status (failed) which also exists. Thus the count for that status is incremented.

例如,可以使用 Data :: Dumper 包. 上面代码中注释掉的行会产生

The whole strucure can handily be seen using, for example, the Data::Dumper package. The commented out lines in the code above would produce


$VAR1 = {
    '64400' => {
        'user1/test123' => {
                                'failed' => 2
                           },
        'user1/test1234' => {
                                'failed' => 1
                            }
                }
        };

随着我们继续处理生产线,将根据需要添加新密钥(端口,用户,状态),并将完整的层次结构降低到计数(第一次为1),或者,如果遇到现有密钥,则其计数增加.例如,可以遍历并使用生成的数据结构,如代码所示.另请参阅大量文档.

As we keep processing lines new keys are added as needed (ports, users, status) with the full hierarchy down to the count (of 1 the first time), or, if an existing is encountered, its count is incremented. The generated data structure can be traversed and used as seen in the code, for example. Please also see the plentiful documentation for more on that.

这篇关于使用perl计算日志文件中变量组合的数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆