Unix shell脚本在数千个文件中搜索错误代码,然后在文本文件中打印计数 [英] Unix shell script to search for error codes in thousand files then print the count in text file
问题描述
我每天需要在1700个文件中同时找到150个以上的eventType和errorCodes.这意味着我必须循环遍历1700个文件才能找到150+ eventType/errorCode的出现次数,并将这些次数作为每日报告存储在文本文件中.
I need to find both 150+ eventType and errorCodes in 1700 files each day. That means i have to loop over 1700 files to find the occurrence count of 150+ eventType/errorCode and put those counts in a text file as a daily report.
我已将这些eventType/errorCode值放置在以逗号分隔的文本文件中:
I have placed those eventType/errorCode values in a text file separated by commas:
10008,4569
10008,4568
10003,1200
40000,4006
我的初始代码:
#!/bin/bash
DT=`date +%Y%m%d%H` //Today's date
fileName=$(date --date="-1 day" +"%Y%m%d") //file name associated with yesterday's date
Yesterday=$(date --date="-1 day" +"%Y-%m-%d") //Yesterday's date
cd /advdata/datashareB/FFFF/continuousDownstream/` echo $Yesterday`
### Here I want to loop through text file that contains both errorCodes/eventsType and search them in 1700 files. in the loop i have to execute the following command:
### eventExport -printEvents -file Run_`echo $fileName`*_*.tar -filter "ErrorCode=4569;EventType=10008" -names -silent | wc -l
输出应以以下格式写入文本文件:
The output should be written to a text file in the following format:
Date 10008/4569 10008/4568 10003,1200 ... ...
20160621 100 12800 58
........ .... ..... ... .... ... ...
其中第一行是标题,第二行是errorCodes/eventsType的总数.
where the first row is the header and the second row is the total count of errorCodes/eventsType.
每天,脚本应在输出文件(文本文件)的新行中插入值.
Every day the script should insert the values in the new line in the output file (text file).
如何编写此循环?
文件格式为tar文件,如Run_20160622_105700_02of04.tar
. eventExport读取这些tar文件并提取错误代码&在eventExport参数中指定的eventTypes.命令就像:
The file format is tar file like Run_20160622_105700_02of04.tar
. eventExport reads those tar files and extract error codes & eventTypes as given in the eventExport argument. the command is like:
eventExport -printEvents -file Run_20160526_09*_*.tar -filter "**ErrorCode=4569;EventType=10008**" -names -silent | head | awk -F, '{OFS =","; print $3, $8,$9, $14}'
的输出是:
AccessKey="706385970",EventType=10008,OrigEventTime=2016-06-21 23:29:42.000,ErrorCode=4569
在这里,eventsType与errorCode关联.我有150多个eventType,我想找到它们并在tar文件中获取它们的计数. tar文件是每天生成1700多个文件.
Here, eventsType is associated with errorCode. I have more than 150 eventTypes which i want to find them and get their counts in the tar files. tar files are more than 1700 file generated per day.
推荐答案
这里是GNU awk
脚本(作为其自己的脚本文件,用于可重用性),用于解析事件类型并为日志文件提供错误代码并报告计数每个日期匹配的事件类型和错误代码.
Here is a GNU awk
script (as its own script file, for reusability) that parses the event types and error codes the log file and reports the counts of matching event types and error codes for each date.
#!/usr/bin/awk -f
/^[0-9]+,[0-9]+$/ {
# this line contains event type and error code
split($0, data, ",");
keys[data[1]][data[2]] = 0;
}
match($0, "EventType=([0-9]+).*ErrorCode=([0-9]+)", key) {
# this line is from the log file
if (key[1] in keys && key[2] in keys[key[1]]) {
match($0, "OrigEventTime=([0-9-]+)", date);
datecount[date[1]][key[1]][key[2]]++;
}
}
END {
for (d in datecount) {
for (k1 in datecount[d]) {
for (k2 in datecount[d][k1]) {
printf("%s\t%s/%s\t%d\n",
d, k1, k2, datecount[d][k1][k2]);
}
}
}
}
运行它(请注意,此需要GNU awk
):
Running it (note thot this requires GNU awk
):
$ awk -f script.awk codes.txt run.log
您想要的格式的输出不太,但我希望它足够接近:
The output is not quite in the format that you wanted, but I'm hoping it's close enough:
2016-06-11 10008/4569 1
2016-06-21 10008/4569 4
2016-06-21 40000/4006 1
(我重复了几次您提供给我们的数据,并更改了日期以及事件类型和错误代码之一).
(I duplicated the data that you gave us a few times and change a date and one of the event types and error codes).
更新:我为低于4.0的GNU awk
版本(不理解数组数组)重新编写了脚本:
UPDATE: I reworked the script for GNU awk
versions older than 4.0 (that do not understand arrays of arrays):
#!/usr/bin/awk -f
/^[0-9]+,[0-9]+$/ {
# this line contains event type and error code
split($0, data, ",");
keys[data[1],data[2]] = 1;
}
match($0, "EventType=([0-9]+).*ErrorCode=([0-9]+)", key) {
# this line is from the log file
if (keys[key[1],key[2]] == 1) {
match($0, "OrigEventTime=([0-9-]+)", date);
count[date[1],key[1],key[2]]++;
}
}
END {
for (comb in count) {
split(comb, field, SUBSEP);
printf("%s\t%s/%s\t%s\n", field[1], field[2], field[3], count[comb]);
}
}
这篇关于Unix shell脚本在数千个文件中搜索错误代码,然后在文本文件中打印计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!