与哈希播放从Perl中的FTP流量 [英] Playing with Hashes from a FTP flow in Perl

查看：131 发布时间：2016/6/2 22:38:33 arrays perl hash ftp

本文介绍了与哈希播放从Perl中的FTP流量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

好了，所以我显然有一些问题了解如何使用散列工作。长话短说，我试图通过FTP日志分析和找到一个特定的搜索标准的相关流程。基本上我要把它做的是，说我有一个IP地址或用户名，它首先做了pretty简单grep来尽量减少我不需要任何数据和输出发送到外部文件。如果我在寻找的用户名testing1，那么它在testing1一个grep和将输出发送到名为output.txt的另一个文件：

  12月2日0点14分09秒的ftpd FTP1 [743]：USER testing1
12月2日零点14分09秒的ftpd FTP1 [743]：FTP登录FROM 192.168.0.2 [192.168.0.2]，testing1
12月2日0点30分08秒的ftpd FTP1 [1261]：USER testing1
12月2日0时30分09秒的ftpd FTP1 [1261]：FTP登录FROM 192.168.0.4 [192.168.0.4]，testing1
12月2日1点12分33秒的ftpd FTP1 [11804]：USER testing1
12月2日1点12分33秒的ftpd FTP1 [11804]：FTP登录FROM 192.168.0.2 [192.168.0.2]，testing1

和下面是始发日志数据的示例：

  12月1日23时59分03秒的ftpd FTP1 [4152]：USER testing1
12月1日23时59分03秒的ftpd FTP1 [4152]：PASS密码
12月1日23时59分03秒的ftpd FTP1 [4152]：FTP登录FROM 192.168.0.02 [192.168.0.2]，testing1
12月1日23时59分03秒的ftpd FTP1 [4152]：PWD
12月1日23时59分03秒的ftpd FTP1 [4152]：CWD /测试/数据/
12月1日23时59分03秒的ftpd FTP1 [4152]类型的图像

然后我进去，把所有我该ID的时间一起找到processIDs并把它们放到一个哈希值。这就是你看到的如下：

  $ VAR1 = {
      '743'=＆GT; [
                 00：1
               ]
      '20687'=＆GT; [
                   '01：3'
                 ]
      '27186'=＆GT; [
                   '15：3'
                 ]
      '6929'=＆GT; [
                  '12：0'
                ]
      '24771'=＆GT; [
                   09：0
                 ]
      '11804'=＆GT; [
                   01：1
                 ]
      '27683'=＆GT; [
                   '08：3'
                 ]
      '14976'=＆GT; [
                   '04：3'
                 ]
};

这看起来好像时间被放入哈希作为数组。我无法弄清楚，为什么这是发生在我决定与它的工作作为一个数组。下面是如何数组的哈希创建：

 ＃------------------------------- ------------------
＃提取PID和从线时间，取出双打
＃------------------------------------------------- ------
我的$ infile3 ='output.txt的';
我的PID％;
我发现$;
我是$ var;开（INPUT2，$ infile3）或死无法读取$ infile3 \\ n;而（我的$行=＆LT;＆INPUT2 GT;）{
    如果（$行=〜/（\\ D {2}）\\：（\\ D）/）{
        我的$ HHMM = $ 1 ：。 $ 2;
        如果（$行=〜/ftpd\\[(.*?)\\]/）{
            $找到= 0;
            的foreach $ VAR（键％的PID）{
                如果（grep的$ 1 =〜$ VAR，钥匙％的PID）{
                    $找到= 1;
                }
            }
            如果（$找到== 0）{
                推@ {$的PID {$ 1}}，$ HHMM;            }
        }
    }}

为了加快速度，我决定阅读所有具有匹配的PID，他们是否适合流量或没有台词，到一个数组，所以我没有继续读在原始文件。

  ## ------------------------------------ -------------------
##从文件中读取每一行到一个数组
## ------------------------------------------------ -------
开（INPUT，$ infile2）或死无法读取$ infile2 \\ n;我@messages;而（我的$行=＆LT; INPUT＆GT;）{
    ＃如果存在匹配到PID然后把线阵列中
    如果（$行=〜/ ftpd的\\ [（。*？）\\] /）{
        我的$ MPID = $ 1;
        我的foreach $键（按键％的PID）{
            如果（$关键=〜$ MPID）{
                推@messages，$线;
            }
        }
    }
}

我现在正在试图将线匹配的PID和时间来获得流量。我只匹配HH：在时间M更多的机会获得整个流程，也因为与具有相同的时间内一个PID其他流的几率是pretty渺茫。最终，所有这些结果将被发送到内部网页。

 ＃------------------------------- ------------------
#find基于从标准发现流的PID
＃------------------------------------------------- ------我的foreach $线（@messages）{
    如果（我的（$ PID）= $行=〜微{\\ [\\ S *（\\ d +）\\]：} X）{
        如果（$行=〜/（\\ D {2}）\\：（\\ D）/）{
            我的$时间= $ 1 ：。 $ 2;
            如果（$的PID {$ PID} [0] =〜/ $时间/）{
                 推$的PID {$ PID} [0]，$线;
            }
        }
    }
}

现在上面的code由于某种原因，实际上是从哈希删除时间一旦匹配。我不清楚为什么发生这种情况。

我能得到正在与一个bash脚本，但花了几十年它完成。得益于这里的人，我决定用Perl因此我基本上采取了速成班来解决它的建议。我读过的一切，我可以和有基本的编程技巧在C ++中，但显然还需要大量的工作。我也得到了它的工作使用数组但再次它是慢得令人难以置信，我是得到了很多匹配的进程ID而不是我一直在寻找的流量流。所以，经过进一步的建议，我决定与哈希工作，有进程ID为关键，必须参考该键的具体时间，然后同时具有该键和时间与流量日志中的线。我已经有多个问题对这个但A.未作明确解释自己和B.一直在努力，因为我学到不同的东西。但备案大家在这里得到了大幅帮我，我希望有一天我能为别人这个名单上是相同的。出于某种原因，我无法通过我的厚头骨得到这个东西。

不管怎么说，希望我谈过了，我敢肯定，我开始得到人们的神经与所有这些问题，所以我道歉。

更新：

嗯，我想我想出如何使这一切哈希值，但不看的权利。我改变推 @ {$的PID {$ 1}}，$ HHMM; 到 $的PID {$ 1} {$ X} = $ HHMM; 这将创建下列内容：

  $ VAR1 = {
          '743'=＆GT; {
                     ''=＆GT; 00：1
                   }，
          '20687'=＆GT; {
                       ''=＆GT; '01：3'
        }，

但它看起来并不像它引用正确的，所以当我做打印$的PID {743}; 所有它打印是 HASH（0x4caf10 ）

更新：

好吧，我可以通过更改 @ {$的PID {$ 1}}，$ HHMM把所有的值转换成散列; 到 $ PID的{$ 1} = $ HHMM; 这似乎是工作：

  $ VAR1 = {
          '743'=＆GT; '00：1'，
          '20687'=＆GT; '01：3'，
};

但现在我怎么检查，看看是否值00：1另一个变量相匹配？这是我目前已经和不工作：

 如果（$的PID {$ PID} == QR / $时间/）{
    $的PID {$ PID} {$}时间[$ Y] = $行;
    $ Y ++;
};

这是它应该如何看比赛作出后：

  $ VAR1 = {
          '743'=＆GT; '00：1'，
          '4771'=＆GT; {
                      '23：5'=＆GT; [
                                  12月1日23时59分23秒的ftpd FTP1 [4771]：用户测试
'，
                                  12月1日23时59分23秒的ftpd FTP1 [4771]：PASS密码
'，
                                  12月1日23时59分23秒的ftpd FTP1 [4771]：FTP登录FROM 192.168.0.2 [192.168.0.2]，测试
'，
                                  12月1日23时59分23秒的ftpd FTP1 [4771]：CWD /家/测试/
'，
                                  12月1日23时59分23秒的ftpd FTP1 [4771]类型的图像
'，
                                  12月1日23时59分23秒的ftpd FTP1 [4771] PASV
'，
                                  12月1日23时59分23秒的ftpd FTP1 [4771]：RETR测试
'，
                                  12月1日23时59分23秒的ftpd FTP1 [4771]：退出
'，
                                  12月1日23时59分23秒的ftpd FTP1 [4771]：FTP会议闭幕

                                ]
                    }，

解决方案

您有几个错误，在你的code。

首先，你只拉出分钟的一个数字：

 如果（$行=〜/（\\ d {2}）\\（\\ D）/）{

应

 如果（$行=〜/（\\ d {2}）\\（\\ D {2}）/）{

如果我是间$ P $正确pting您code的意图，你想找出
您是否已经看到了时间对于一个给定PID，所以你只设置
第一次。如果是这样，你不通过所有PID％的钥匙需要循环
做这个。你真正需要做的是

 如果（$行=〜/ftpd\\[(.*?)\\]/）{
            $ PID {$ 1} [0] = $ HHMM除非存在$ PID {$ 1};
        }

请注意，我在做一项任务，而不是一推，所以我将结束
在数组引用的第一个元素的时间。

我想你可能意味着输入==代替=〜如下：

 如果（$的grep 1 =〜$ VAR，钥匙％的PID）{

presumably需要捕获不仅仅是时间的详细信息，如用户名，
传输类型等，所以你会发现它更好地使用散列引用，而不是根据PID数组引用。你可以标记并轻松找到你的信息，这样的话：

 我的$ PID = $ 1;
        如果（$行=〜/ftpd\\[(.*?)\\]/）{
            $ PID {$ PID} {}时间= $ HHMM除非存在$ PID {$ PID};
        }
        如果（$行=〜/ USER（\\ w +）/）{
            $ PID {$ PID} {}用户= $ 1;
        }

当然，你将根据任何使最适合您的目的，使您快速搜索想要索引。例如，你可能会保持第二哈希时间为指标：

  $ {时间HHMM $} {} PID = $ PID;

，甚至保持与给定用户相关联的所有PID

的列表

 推@ {$用户{$ 1}}，$ PID;

Ok, so I'm obviously having some issues understanding how to work with hashes. Long story short, I'm attempting to parse through an ftp log and find the relevant flows for a specific search criteria. Basically what I'm trying to make it do is, say I have an IP address or a user name, it first does a pretty simple grep to try to minimize any data I don't need and send the output to an external file. If I'm searching for username testing1, then it does a grep on testing1 and sends the output to another file called output.txt:

Dec  2 00:14:09 ftp1 ftpd[743]: USER testing1
Dec  2 00:14:09 ftp1 ftpd[743]: FTP LOGIN FROM 192.168.0.2 [192.168.0.2], testing1
Dec  2 00:30:08 ftp1 ftpd[1261]: USER testing1
Dec  2 00:30:09 ftp1 ftpd[1261]: FTP LOGIN FROM 192.168.0.4 [192.168.0.4], testing1
Dec  2 01:12:33 ftp1 ftpd[11804]: USER testing1
Dec  2 01:12:33 ftp1 ftpd[11804]: FTP LOGIN FROM 192.168.0.2 [192.168.0.2], testing1

And below is an example of the originating log data:

Dec  1 23:59:03 ftp1 ftpd[4152]: USER testing1
Dec  1 23:59:03 ftp1 ftpd[4152]: PASS password  
Dec  1 23:59:03 ftp1 ftpd[4152]: FTP LOGIN FROM 192.168.0.02 [192.168.0.2], testing1  
Dec  1 23:59:03 ftp1 ftpd[4152]: PWD  
Dec  1 23:59:03 ftp1 ftpd[4152]: CWD /test/data/  
Dec  1 23:59:03 ftp1 ftpd[4152]: TYPE Image

I then go in, put all the processIDs that I find along with the time of that ID and put them into a hash. Which is what you see below:

$VAR1 = {
      '743' => [
                 '00:1'
               ],
      '20687' => [
                   '01:3'
                 ],
      '27186' => [
                   '15:3'
                 ],
      '6929' => [
                  '12:0'
                ],
      '24771' => [
                   '09:0'
                 ],
      '11804' => [
                   '01:1'
                 ],
      '27683' => [
                   '08:3'
                 ],
      '14976' => [
                   '04:3'
                 ],
};

It looks as if the time is being put into the hash as an array. I was unable to figure out why this is happening to I decided to work with it as an array. The following is how the hash of arrays are created:

# -------------------------------------------------------
# Extract PIDs and Time from lines, take out doubles
# -------------------------------------------------------
my $infile3 = 'output.txt';
my %pids;
my $found;
my $var;

open (INPUT2, $infile3) or die "Couldn't read $infile3.\n";

while (my $line = <INPUT2>) {
    if($line =~ /(\d{2})\:(\d)/ ) {
        my $hhmm = $1 . ":" . $2;
        if ($line =~ /ftpd\[(.*?)\]/) {
            $found = 0;
            foreach $var(keys %pids){
                if(grep $1 =~ $var, keys %pids){
                    $found = 1;
                }
            }
            if ($found == 0){
                push @{$pids{$1}}, $hhmm;

            }
        }       
    }

}

To speed things up I have decided to read all the lines that have the matching PIDs, whether they fit the flow or not, into an array so I don't have to keep reading in the originating file.

##-------------------------------------------------------
## read each line from file into an array
##-------------------------------------------------------
open (INPUT, $infile2) or die "Couldn't read $infile2.\n";

my @messages;

while (my $line = <INPUT>){
    # if there is a match to the PID then put the line in the array
    if ($line =~ /ftpd\[(.*?)\]/){
        my $mPID = $1;
        foreach my $key (keys %pids){
            if ($key =~ $mPID){
                push @messages, $line;
            }
        }  
    }
}

I'm now trying to match the line up with the PID and the Time to get the flow. I'm only matching the hh:m in the time for more of a chance to get the entire flow and because chances of other flows with a PID having the same timeframe is pretty slim. Eventually all these results will be send to an internal web page.

# -------------------------------------------------------
#find flow based on PID that was found from criteria
#-------------------------------------------------------

foreach my $line(@messages){
    if(my($pid) = $line =~ m{ \[ \s*(\d+) \]: }x) {
        if($line =~ /(\d{2})\:(\d)/){
            my $time = $1 . ":" . $2;
            if ($pids{$pid}[0] =~ /$time/){
                 push $pids{$pid}[0], $line;
            }
        }
    }
}

Right now the above code for some reason is actually deleting the time from the hash once it is matched. I am unsure why this is happening.

I was able to get is working with a bash script but took decades for it to complete. Thanks to suggestions from people here I have decided to tackle it with Perl so am basically taking a crash course. I've read everything I can and have basic programming skills in C++ but obviously still need a lot of work. I also got it working using arrays but once again it was incredibly slow and i was getting a lot of flows that matched the process ID but were not the flows I was looking for. So after further suggestions I decided to work with hashes, have the process ID as the key, have a specific time referenced to that key, and then lines within the log that have both that key and time as the flow. I have had multiple questions on this already but have A. Not explained myself clearly and B. have been trying different things as I learn. But for the record everyone here has helped me tremendously and I hope that one day I can do the same for others on this list. For some reason I just can't get this stuff through my thick skull.

Anyways, hopefully I covered everything, I'm sure I'm starting to get on people's nerves with all these questions so I apologize.

UPDATE:

Well I think I figured out how to make it all hashes but doesn't look right. I changed push @{$pids{$1}}, $hhmm; to $pids{$1}{$x} = $hhmm; which creates the following:

$VAR1 = {
          '743' => {
                     '' => '00:1'
                   },
          '20687' => {
                       '' => '01:3'
        },

But it doesn't look like it's referencing correctly so when I do print $pids{743}; all it prints is HASH(0x4caf10)

UPDATE:

Ok, I was able to put all the values into hashes by changing @{$pids{$1}}, $hhmm; to $pids{$1} = $hhmm; which seems to be working:

$VAR1 = {
          '743' => '00:1',
          '20687' => '01:3',
};

But now how do I check to see if the value '00:1' matches another variable? This is what I currently have and is not working:

if($pids{$pid} == qr/$time/){
    $pids{$pid}{$time}[$y] = $line;
    $y++;
};

This is how it should look after the match is made:

$VAR1 = {
          '743' => '00:1',
          '4771' => {
                      '23:5' => [
                                  'Dec  1 23:59:23 ftp1 ftpd[4771]: USER test
',
                                  'Dec  1 23:59:23 ftp1 ftpd[4771]: PASS password
',
                                  'Dec  1 23:59:23 ftp1 ftpd[4771]: FTP LOGIN FROM 192.168.0.2 [192.168.0.2], test
',
                                  'Dec  1 23:59:23 ftp1 ftpd[4771]: CWD /home/test/
',
                                  'Dec  1 23:59:23 ftp1 ftpd[4771]: TYPE Image
',
                                  'Dec  1 23:59:23 ftp1 ftpd[4771]: PASV
',
                                  'Dec  1 23:59:23 ftp1 ftpd[4771]: RETR test
',
                                  'Dec  1 23:59:23 ftp1 ftpd[4771]: QUIT
',
                                  'Dec  1 23:59:23 ftp1 ftpd[4771]: FTP session closed
'
                                ]
                    },

解决方案

You have a couple of errors in your code.

The first is that you're only pulling out one digit of the minutes:

    if($line =~ /(\d{2})\:(\d)/ ) {

should be

    if($line =~ /(\d{2})\:(\d{2})/ ) {

If I'm interpreting the intent of your code correctly, you're trying to find out whether you've already seen a time for a given pid so that you only set it the first time. If so, you don't need to loop through all the keys in %pid to do this. All you really need to do is

        if ($line =~ /ftpd\[(.*?)\]/) {
            $pid{$1}[0] = $hhmm unless exists $pid{$1};
        }

Notice that I'm doing an assignment rather than a push, so I will wind up with the time in the first element of the array reference.

I think you may have meant to type "==" instead of "=~" below:

            if(grep $1 =~ $var, keys %pids){

Presumably you need to capture more information than just the time, such as the user name, transfer type, etc. so you may find it better to use a hash reference instead of an array reference under the pid. That way you can tag and easily find your information:

        my $pid = $1;
        if ($line =~ /ftpd\[(.*?)\]/) {
            $pid{$pid}{time} = $hhmm unless exists $pid{$pid};
        }
        if ($line =~ /USER (\w+)/) {
            $pid{$pid}{user} = $1;
        }

Of course, you'll want to index according to whatever makes most sense for your purposes to make your searches fast. For instance, you might keep a second hash indexed by time:

           $time{$hhmm}{pid} = $pid;

or even keep a list of all the pids associated with a given user

           push @{$user{$1}}, $pid;

这篇关于与哈希播放从Perl中的FTP流量的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

与哈希播放从Perl中的FTP流量 [英] Playing with Hashes from a FTP flow in Perl

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

与哈希播放从Perl中的FTP流量 [英] Playing with Hashes from a FTP flow in Perl

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭