复杂的分析在PHP中的文本文件 [英] Complex parsing a text file in PHP

查看：114 发布时间：2016/6/3 21:31:29 php arrays file parsing text

本文介绍了复杂的分析在PHP中的文本文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以我试图解析它具有以下格式的TXT文件。每个条目是在单独的一行。

SAMPLE.TXT

  2016年2月24日13点54分23秒Local0.Info 172.16.120.4 1 1456311263.500015263 ASD_MX600网址SRC = 172.16.41.15：62490 DST = 144.76.76.148：80 MAC = 00： 1B：0D：63：84：00 = CN =史密斯\\约翰，OU = S-HS，OU = SACC，DC = ABC，DC =组织，DC = AB剂='的Mozilla / 5.0（Windows NT的6.1; WOW64 ; RV：36.0）的Gecko / 20100101 SEB / 2.0 SEBKEY的要求：GET http://something.com/theme/image.php/clean/page/1455532301/icon2016年2月24日13时54分23秒Local0.Info 172.16.120.4 1 1456311263.500097075 ASD_MX600网址SRC = 172.16.41.15：62485 DST = 144.76.76.148：80 MAC = 00：1B：0D：63：84：00用户= CN =史密斯\\约翰，OU = S-HS，OU = SACC，DC = ABC，DC =组织，DC = AB剂='的Mozilla / 5.0（Windows NT的6.1; WOW64; RV：36.0）的Gecko / 20100101 SEB / 2.0 SEBKEY 请求：GET http://somethingelse.com/theme/image.php/clean/core/1455532301/f/pdf-24

我需要做到以下几点：
结果
1.将整个文件解析到一个数组。 // DONE
结果2。拿起一切后，1 145 ...（这将在结束[3]数组），并进一步分析它，使我有以下的故障。结果
- 网址的结果
- SRC = 172.16.41.15：62490结果
- DST = 144.76.76.148：80结果
- MAC = 00：1B：0D：63：84：00结果
- 用户= CN =史密斯\\约翰，OU = S-HS，OU = SACC，DC = ABC，DC =组织，DC = AB结果
- 代理='的Mozilla / 5.0（Windows NT的6.1; WOW64; RV：36.0）的Gecko / 20100101 SEB / 2.0 SEBKEY'结果
- 要求：GET结果
- http://something.com/theme/image.php/clean/页/ 1455532301 /图标结果

我有一个很难得到主循环内的第二解析的语法正确。我从指数3 [3]整个巨大的部分，我想我也使用爆炸（）的权利的基础上砍它关闭但后来我迷路了。我如何获取数据的保持如上所示？我的code迄今取得的进展：

 ＆LT; PHP$ txt_file =的file_get_contents（'C：\\ sample.txt的'）;
$行=爆炸（\\ n，$ txt_file）;
array_shift（$行）;的foreach（$行作为$行=＆GT; $数据）
{
    //获取行数据
    $ ROW_DATA =爆炸（''，$数据）; //斩每行第一个基于更大的空间  // --------------------------
    $信息[$行] ['戳'] = $ ROW_DATA [0];
   // $信息[$行] ['localinfo'] = $ ROW_DATA [1];
    $信息[$行] ['IP'] = $ ROW_DATA [2];
    $信息[$行] ['其他'] = $ ROW_DATA [3]; //这是最长的字符串存在
  // --------------------------    $ row_data1 =爆炸（''，$ ROW_DATA [3]）; //斩索引项的基础上更小的空间    $ rowd_data2 [$ row_data1] ['urlsflows'] = $ row_data1 [3];
     //显示数据
  //回声'行'。 $行。 TIMESTAMP：。 $信息[$行] ['戳']。 '＆LT; BR /＆GT;';
   //回声'行'。 $行。 LOCALINFO：。 $信息[$行] ['localinfo']。 '＆LT; BR /＆GT;';
   //回声'行'。 $行。 IP：。 $信息[$行] ['IP']。 '＆LT; BR /＆GT;';  //  - 下面的线是我在哪里丢失。请帮助。    回声$ rowd_data2 [$ row_data1] ['urlsflows'];
      } // for循环结束？＆GT;

解决方案

这code适用于输入文件：

 ＆LT; PHP
$行=爆炸（\\ N的file_get_contents（'SAMPLE.TXT'））;
$结果=阵列（）;的foreach（$行作为$行）{
    如果（修剪（$行）==）{
        继续;
    }
    $ timeMatches =阵列（）;
    $重定时=/（[0-9  - ] * [0-9：] *）/;
    preg_match（$重定时，$行，$ timeMatches）;
    $重=。/src=(.*）DST =（*）MAC =（*）用户=（*）剂=（*）要求：（*）（*）/。
    $匹配=阵列（）;
    preg_match（$重，$行，$匹配）;
    $结果[] =阵列（'时间'=＆GT; $ timeMatches [1]，SRC =＆GT; $匹配[1]
                DST=＆GT; $比赛[2]，'MAC'= GT; $匹配[3]
                ，'用户'= GT; $比赛[4]，'代理'=＆GT; $比赛[5]
                ，'法'=＆GT; $比赛[6]，'URL'=＆GT; $比赛[7]）;
}后续代码var_dump（$结果）;

后续代码var_dump（$结果）的输出是：

 阵列（2）{
[0] =＆GT;
  阵列（8）{
    [时间] =＆GT;
    串（20），2016年2月24日13时54分23秒
    [SRC] =＆GT;
    串（18）172.16.41.15:62490
    [DST] =＆GT;
    串（16）144.76.76.148:80
    [陆委会] =＆GT;
    串（17）00：1B：0D：63：84：00
    [用户] =＆GT;
    串（49）CN =史密斯\\约翰，OU = S-HS，OU = SACC，DC = ABC，DC =组织，DC = AB
    [代理] =＆GT;
    串（76）'的Mozilla / 5.0（Windows NT的6.1; WOW64; RV：36.0）的Gecko / 20100101 SEB / 2.0 SEBKEY'
    [法] =＆GT;
    串（3）GET
    [URL] =＆GT;
    串（63）http://something.com/theme/image.php/clean/page/1455532301/icon
  }
  [1] =＆GT;
  阵列（8）{
    [时间] =＆GT;
    串（20），2016年2月24日13时54分23秒
    [SRC] =＆GT;
    串（18）172.16.41.15:62485
    [DST] =＆GT;
    串（16）144.76.76.148:80
    [陆委会] =＆GT;
    串（17）00：1B：0D：63：84：00
    [用户] =＆GT;
    串（49）CN =史密斯\\约翰，OU = S-HS，OU = SACC，DC = ABC，DC =组织，DC = AB
    [代理] =＆GT;
    串（76）'的Mozilla / 5.0（Windows NT的6.1; WOW64; RV：36.0）的Gecko / 20100101 SEB / 2.0 SEBKEY'
    [法] =＆GT;
    串（3）GET
    [URL] =＆GT;
    串（71）http://somethingelse.com/theme/image.php/clean/core/1455532301/f/pdf-24
  }
}

So I am trying to parse a TXT file which has the following format. Each entry is on a single line.

SAMPLE.TXT

2016-02-24 13:54:23 Local0.Info 172.16.120.4    1 1456311263.500015263 ASD_MX600 urls src=172.16.41.15:62490 dst=144.76.76.148:80 mac=00:1B:0D:63:84:00 user=CN=Smith\John,OU=S-HS,OU=SAcc,DC=abc,DC=org,DC=ab agent='Mozilla/5.0 (Windows NT 6.1; WOW64; rv:36.0) Gecko/20100101 seb/2.0 SEBKEY' request: GET http://something.com/theme/image.php/clean/page/1455532301/icon

2016-02-24 13:54:23 Local0.Info 172.16.120.4    1 1456311263.500097075 ASD_MX600 urls src=172.16.41.15:62485 dst=144.76.76.148:80 mac=00:1B:0D:63:84:00 user=CN=Smith\John,OU=S-HS,OU=SAcc,DC=abc,DC=org,DC=ab agent='Mozilla/5.0 (Windows NT 6.1; WOW64; rv:36.0) Gecko/20100101 seb/2.0 SEBKEY' request: GET http://somethingelse.com/theme/image.php/clean/core/1455532301/f/pdf-24

I need to do the following:
1. Parse the entire file into an array. //DONE
2. Pick up everything after 1 145... (which will end up in [3] of the array) and parse it further so that I have the following breakdowns.
- urls
- src=172.16.41.15:62490
- dst=144.76.76.148:80
- mac=00:1B:0D:63:84:00
- user=CN=Smith\John,OU=S-HS,OU=SAcc,DC=abc,DC=org,DC=ab
- agent='Mozilla/5.0 (Windows NT 6.1; WOW64; rv:36.0) Gecko/20100101 seb/2.0 SEBKEY'
- request: GET
- http://something.com/theme/image.php/clean/page/1455532301/icon

I am having a hard time getting the syntax right for the 2nd parse within the main loop. I get the entire giant section from index 3 [3] and I think I am also using the explode() right to chop it off based on ' ' but then I am lost. How do i get hold of the data as shown above? My code progress so far:

<?php

$txt_file    = file_get_contents('C:\sample.txt');
$rows        = explode("\n", $txt_file);
array_shift($rows);

foreach($rows as $row => $data)
{
    //get row data
    $row_data = explode('   ', $data);   //chop each row first based on bigger space

  //--------------------------
    $info[$row]['timestamp']           = $row_data[0];
   // $info[$row]['localinfo']         = $row_data[1];
    $info[$row]['ip']  = $row_data[2];
    $info[$row]['other']       = $row_data[3]; //This is where LONGEST string exists
  //--------------------------

    $row_data1 = explode(' ', $row_data[3]);   //chop index item based on smaller space

    $rowd_data2[$row_data1]['urlsflows']           = $row_data1[3];


     //display data
  //  echo 'Row ' . $row . ' TIMESTAMP: ' . $info[$row]['timestamp'] . '<br />';
   // echo 'Row ' . $row . ' LOCALINFO: ' . $info[$row]['localinfo'] . '<br />';
   // echo 'Row ' . $row . ' IP: ' . $info[$row]['ip'] . '<br />';

  //--The line below is where I am lost. Kindly help.

    echo $rowd_data2[$row_data1]['urlsflows'];


      } //end of for loop

?>

解决方案

This code works for the input file:

<?php
$rows = explode("\n", file_get_contents('SAMPLE.TXT'));
$result = array();

foreach ($rows as $row) {
    if (trim($row) == "") {
        continue;
    }
    $timeMatches = array();
    $reTime = "/([0-9-]* [0-9:]*) /";
    preg_match($reTime, $row, $timeMatches);
    $re = "/src=(.*) dst=(.*) mac=(.*) user=(.*) agent=(.*) request: (.*) (.*)/";
    $matches = array();
    preg_match($re, $row, $matches);
    $result[] = array('time' => $timeMatches[1], 'src' => $matches[1]
                , 'dst' => $matches[2], 'mac' => $matches[3]
                , 'user' => $matches[4], 'agent' => $matches[5]
                , 'method' => $matches[6], 'url' => $matches[7]);
}

var_dump($result);

The output of the var_dump($result) is:

array(2) {
[0]=>
  array(8) {
    ["time"]=>
    string(20) "2016-02-24 13:54:23"
    ["src"]=>
    string(18) "172.16.41.15:62490"
    ["dst"]=>
    string(16) "144.76.76.148:80"
    ["mac"]=>
    string(17) "00:1B:0D:63:84:00"
    ["user"]=>
    string(49) "CN=Smith\John,OU=S-HS,OU=SAcc,DC=abc,DC=org,DC=ab"
    ["agent"]=>
    string(76) "'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:36.0) Gecko/20100101 seb/2.0 SEBKEY'"
    ["method"]=>
    string(3) "GET"
    ["url"]=>
    string(63) "http://something.com/theme/image.php/clean/page/1455532301/icon"
  }
  [1]=>
  array(8) {
    ["time"]=>
    string(20) "2016-02-24 13:54:23"
    ["src"]=>
    string(18) "172.16.41.15:62485"
    ["dst"]=>
    string(16) "144.76.76.148:80"
    ["mac"]=>
    string(17) "00:1B:0D:63:84:00"
    ["user"]=>
    string(49) "CN=Smith\John,OU=S-HS,OU=SAcc,DC=abc,DC=org,DC=ab"
    ["agent"]=>
    string(76) "'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:36.0) Gecko/20100101 seb/2.0 SEBKEY'"
    ["method"]=>
    string(3) "GET"
    ["url"]=>
    string(71) "http://somethingelse.com/theme/image.php/clean/core/1455532301/f/pdf-24"
  }
}

这篇关于复杂的分析在PHP中的文本文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

复杂的分析在PHP中的文本文件 [英] Complex parsing a text file in PHP

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

复杂的分析在PHP中的文本文件 [英] Complex parsing a text file in PHP

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭