复杂的分析在PHP中的文本文件 [英] Complex parsing a text file in PHP
问题描述
所以我试图解析它具有以下格式的TXT文件。每个条目是在单独的一行。
SAMPLE.TXT
2016年2月24日13点54分23秒Local0.Info 172.16.120.4 1 1456311263.500015263 ASD_MX600网址SRC = 172.16.41.15:62490 DST = 144.76.76.148:80 MAC = 00: 1B:0D:63:84:00 = CN =史密斯\\约翰,OU = S-HS,OU = SACC,DC = ABC,DC =组织,DC = AB剂='的Mozilla / 5.0(Windows NT的6.1; WOW64 ; RV:36.0)的Gecko / 20100101 SEB / 2.0 SEBKEY的要求:GET http://something.com/theme/image.php/clean/page/1455532301/icon2016年2月24日13时54分23秒Local0.Info 172.16.120.4 1 1456311263.500097075 ASD_MX600网址SRC = 172.16.41.15:62485 DST = 144.76.76.148:80 MAC = 00:1B:0D:63:84:00用户= CN =史密斯\\约翰,OU = S-HS,OU = SACC,DC = ABC,DC =组织,DC = AB剂='的Mozilla / 5.0(Windows NT的6.1; WOW64; RV:36.0)的Gecko / 20100101 SEB / 2.0 SEBKEY 请求:GET http://somethingelse.com/theme/image.php/clean/core/1455532301/f/pdf-24
我需要做到以下几点:
结果
1.将整个文件解析到一个数组。 // DONE
结果2。拿起一切后,1 145 ...(这将在结束[3]数组),并进一步分析它,使我有以下的故障。结果
- 网址的结果
- SRC = 172.16.41.15:62490结果
- DST = 144.76.76.148:80结果
- MAC = 00:1B:0D:63:84:00结果
- 用户= CN =史密斯\\约翰,OU = S-HS,OU = SACC,DC = ABC,DC =组织,DC = AB结果
- 代理='的Mozilla / 5.0(Windows NT的6.1; WOW64; RV:36.0)的Gecko / 20100101 SEB / 2.0 SEBKEY'结果
- 要求:GET结果
- http://something.com/theme/image.php/clean/页/ 1455532301 /图标结果
我有一个很难得到主循环内的第二解析的语法正确。我从指数3 [3]整个巨大的部分,我想我也使用爆炸()的权利的基础上砍它关闭但后来我迷路了。我如何获取数据的保持如上所示?我的code迄今取得的进展:
< PHP$ txt_file =的file_get_contents('C:\\ sample.txt的');
$行=爆炸(\\ n,$ txt_file);
array_shift($行);的foreach($行作为$行=> $数据)
{
//获取行数据
$ ROW_DATA =爆炸('',$数据); //斩每行第一个基于更大的空间 // --------------------------
$信息[$行] ['戳'] = $ ROW_DATA [0];
// $信息[$行] ['localinfo'] = $ ROW_DATA [1];
$信息[$行] ['IP'] = $ ROW_DATA [2];
$信息[$行] ['其他'] = $ ROW_DATA [3]; //这是最长的字符串存在
// -------------------------- $ row_data1 =爆炸('',$ ROW_DATA [3]); //斩索引项的基础上更小的空间 $ rowd_data2 [$ row_data1] ['urlsflows'] = $ row_data1 [3];
//显示数据
//回声'行'。 $行。 TIMESTAMP:。 $信息[$行] ['戳']。 '< BR />';
//回声'行'。 $行。 LOCALINFO:。 $信息[$行] ['localinfo']。 '< BR />';
//回声'行'。 $行。 IP:。 $信息[$行] ['IP']。 '< BR />'; // - 下面的线是我在哪里丢失。请帮助。 回声$ rowd_data2 [$ row_data1] ['urlsflows'];
} // for循环结束?>
这code适用于输入文件:
< PHP
$行=爆炸(\\ N的file_get_contents('SAMPLE.TXT'));
$结果=阵列();的foreach($行作为$行){
如果(修剪($行)==){
继续;
}
$ timeMatches =阵列();
$重定时=/([0-9 - ] * [0-9:] *)/;
preg_match($重定时,$行,$ timeMatches);
$重=。/src=(.*)DST =(*)MAC =(*)用户=(*)剂=(*)要求:(*)(*)/。
$匹配=阵列();
preg_match($重,$行,$匹配);
$结果[] =阵列('时间'=> $ timeMatches [1],SRC => $匹配[1]
DST=> $比赛[2],'MAC'= GT; $匹配[3]
,'用户'= GT; $比赛[4],'代理'=> $比赛[5]
,'法'=> $比赛[6],'URL'=> $比赛[7]);
}后续代码var_dump($结果);
后续代码var_dump($结果)的输出是:
阵列(2){
[0] =>
阵列(8){
[时间] =>
串(20),2016年2月24日13时54分23秒
[SRC] =>
串(18)172.16.41.15:62490
[DST] =>
串(16)144.76.76.148:80
[陆委会] =>
串(17)00:1B:0D:63:84:00
[用户] =>
串(49)CN =史密斯\\约翰,OU = S-HS,OU = SACC,DC = ABC,DC =组织,DC = AB
[代理] =>
串(76)'的Mozilla / 5.0(Windows NT的6.1; WOW64; RV:36.0)的Gecko / 20100101 SEB / 2.0 SEBKEY'
[法] =>
串(3)GET
[URL] =>
串(63)http://something.com/theme/image.php/clean/page/1455532301/icon
}
[1] =>
阵列(8){
[时间] =>
串(20),2016年2月24日13时54分23秒
[SRC] =>
串(18)172.16.41.15:62485
[DST] =>
串(16)144.76.76.148:80
[陆委会] =>
串(17)00:1B:0D:63:84:00
[用户] =>
串(49)CN =史密斯\\约翰,OU = S-HS,OU = SACC,DC = ABC,DC =组织,DC = AB
[代理] =>
串(76)'的Mozilla / 5.0(Windows NT的6.1; WOW64; RV:36.0)的Gecko / 20100101 SEB / 2.0 SEBKEY'
[法] =>
串(3)GET
[URL] =>
串(71)http://somethingelse.com/theme/image.php/clean/core/1455532301/f/pdf-24
}
}
So I am trying to parse a TXT file which has the following format. Each entry is on a single line.
SAMPLE.TXT
2016-02-24 13:54:23 Local0.Info 172.16.120.4 1 1456311263.500015263 ASD_MX600 urls src=172.16.41.15:62490 dst=144.76.76.148:80 mac=00:1B:0D:63:84:00 user=CN=Smith\John,OU=S-HS,OU=SAcc,DC=abc,DC=org,DC=ab agent='Mozilla/5.0 (Windows NT 6.1; WOW64; rv:36.0) Gecko/20100101 seb/2.0 SEBKEY' request: GET http://something.com/theme/image.php/clean/page/1455532301/icon
2016-02-24 13:54:23 Local0.Info 172.16.120.4 1 1456311263.500097075 ASD_MX600 urls src=172.16.41.15:62485 dst=144.76.76.148:80 mac=00:1B:0D:63:84:00 user=CN=Smith\John,OU=S-HS,OU=SAcc,DC=abc,DC=org,DC=ab agent='Mozilla/5.0 (Windows NT 6.1; WOW64; rv:36.0) Gecko/20100101 seb/2.0 SEBKEY' request: GET http://somethingelse.com/theme/image.php/clean/core/1455532301/f/pdf-24
I need to do the following:
1. Parse the entire file into an array. //DONE
2. Pick up everything after 1 145... (which will end up in [3] of the array) and parse it further so that I have the following breakdowns.
- urls
- src=172.16.41.15:62490
- dst=144.76.76.148:80
- mac=00:1B:0D:63:84:00
- user=CN=Smith\John,OU=S-HS,OU=SAcc,DC=abc,DC=org,DC=ab
- agent='Mozilla/5.0 (Windows NT 6.1; WOW64; rv:36.0) Gecko/20100101 seb/2.0 SEBKEY'
- request: GET
- http://something.com/theme/image.php/clean/page/1455532301/icon
I am having a hard time getting the syntax right for the 2nd parse within the main loop. I get the entire giant section from index 3 [3] and I think I am also using the explode() right to chop it off based on ' ' but then I am lost. How do i get hold of the data as shown above? My code progress so far:
<?php
$txt_file = file_get_contents('C:\sample.txt');
$rows = explode("\n", $txt_file);
array_shift($rows);
foreach($rows as $row => $data)
{
//get row data
$row_data = explode(' ', $data); //chop each row first based on bigger space
//--------------------------
$info[$row]['timestamp'] = $row_data[0];
// $info[$row]['localinfo'] = $row_data[1];
$info[$row]['ip'] = $row_data[2];
$info[$row]['other'] = $row_data[3]; //This is where LONGEST string exists
//--------------------------
$row_data1 = explode(' ', $row_data[3]); //chop index item based on smaller space
$rowd_data2[$row_data1]['urlsflows'] = $row_data1[3];
//display data
// echo 'Row ' . $row . ' TIMESTAMP: ' . $info[$row]['timestamp'] . '<br />';
// echo 'Row ' . $row . ' LOCALINFO: ' . $info[$row]['localinfo'] . '<br />';
// echo 'Row ' . $row . ' IP: ' . $info[$row]['ip'] . '<br />';
//--The line below is where I am lost. Kindly help.
echo $rowd_data2[$row_data1]['urlsflows'];
} //end of for loop
?>
This code works for the input file:
<?php
$rows = explode("\n", file_get_contents('SAMPLE.TXT'));
$result = array();
foreach ($rows as $row) {
if (trim($row) == "") {
continue;
}
$timeMatches = array();
$reTime = "/([0-9-]* [0-9:]*) /";
preg_match($reTime, $row, $timeMatches);
$re = "/src=(.*) dst=(.*) mac=(.*) user=(.*) agent=(.*) request: (.*) (.*)/";
$matches = array();
preg_match($re, $row, $matches);
$result[] = array('time' => $timeMatches[1], 'src' => $matches[1]
, 'dst' => $matches[2], 'mac' => $matches[3]
, 'user' => $matches[4], 'agent' => $matches[5]
, 'method' => $matches[6], 'url' => $matches[7]);
}
var_dump($result);
The output of the var_dump($result) is:
array(2) {
[0]=>
array(8) {
["time"]=>
string(20) "2016-02-24 13:54:23"
["src"]=>
string(18) "172.16.41.15:62490"
["dst"]=>
string(16) "144.76.76.148:80"
["mac"]=>
string(17) "00:1B:0D:63:84:00"
["user"]=>
string(49) "CN=Smith\John,OU=S-HS,OU=SAcc,DC=abc,DC=org,DC=ab"
["agent"]=>
string(76) "'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:36.0) Gecko/20100101 seb/2.0 SEBKEY'"
["method"]=>
string(3) "GET"
["url"]=>
string(63) "http://something.com/theme/image.php/clean/page/1455532301/icon"
}
[1]=>
array(8) {
["time"]=>
string(20) "2016-02-24 13:54:23"
["src"]=>
string(18) "172.16.41.15:62485"
["dst"]=>
string(16) "144.76.76.148:80"
["mac"]=>
string(17) "00:1B:0D:63:84:00"
["user"]=>
string(49) "CN=Smith\John,OU=S-HS,OU=SAcc,DC=abc,DC=org,DC=ab"
["agent"]=>
string(76) "'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:36.0) Gecko/20100101 seb/2.0 SEBKEY'"
["method"]=>
string(3) "GET"
["url"]=>
string(71) "http://somethingelse.com/theme/image.php/clean/core/1455532301/f/pdf-24"
}
}
这篇关于复杂的分析在PHP中的文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!