Unix shell脚本在数千个文件中搜索错误代码,然后在文本文件中打印计数 [英] Unix shell script to search for error codes in thousand files then print the count in text file

查看:83
本文介绍了Unix shell脚本在数千个文件中搜索错误代码,然后在文本文件中打印计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我每天需要在1700个文件中同时找到150个以上的eventType和errorCodes.这意味着我必须循环遍历1700个文件才能找到150+ eventType/errorCode的出现次数,并将这些次数作为每日报告存储在文本文件中.

I need to find both 150+ eventType and errorCodes in 1700 files each day. That means i have to loop over 1700 files to find the occurrence count of 150+ eventType/errorCode and put those counts in a text file as a daily report.

我已将这些eventType/errorCode值放置在以逗号分隔的文本文件中:

I have placed those eventType/errorCode values in a text file separated by commas:

10008,4569
10008,4568
10003,1200
40000,4006

我的初始代码:

#!/bin/bash
DT=`date +%Y%m%d%H` //Today's date  
fileName=$(date --date="-1 day" +"%Y%m%d") //file name associated with yesterday's date
Yesterday=$(date --date="-1 day" +"%Y-%m-%d") //Yesterday's date
cd /advdata/datashareB/FFFF/continuousDownstream/` echo $Yesterday`

### Here I want to loop through text file that contains both errorCodes/eventsType and search them in 1700 files. in the loop i have to execute the following command:
###    eventExport -printEvents -file Run_`echo $fileName`*_*.tar -filter "ErrorCode=4569;EventType=10008"  -names -silent | wc -l

输出应以以下格式写入文本文件:

The output should be written to a text file in the following format:

Date            10008/4569  10008/4568   10003,1200  ...  ...
20160621         100            12800      58
........          ....          .....      ...  ....  ... ...

其中第一行是标题,第二行是errorCodes/eventsType的总数.

where the first row is the header and the second row is the total count of errorCodes/eventsType.

每天,脚本应在输出文件(文本文件)的新行中插入值.

Every day the script should insert the values in the new line in the output file (text file).

如何编写此循环?

文件格式为tar文件,如Run_20160622_105700_02of04.tar . eventExport读取这些tar文件并提取错误代码&在eventExport参数中指定的eventTypes.命令就像:

The file format is tar file like Run_20160622_105700_02of04.tar . eventExport reads those tar files and extract error codes & eventTypes as given in the eventExport argument. the command is like:

eventExport -printEvents -file Run_20160526_09*_*.tar -filter "**ErrorCode=4569;EventType=10008**"  -names -silent | head | awk -F, '{OFS =","; print $3, $8,$9, $14}' 

的输出是:

AccessKey="706385970",EventType=10008,OrigEventTime=2016-06-21 23:29:42.000,ErrorCode=4569

在这里,eventsType与errorCode关联.我有150多个eventType,我想找到它们并在tar文件中获取它们的计数. tar文件是每天生成1700多个文件.

Here, eventsType is associated with errorCode. I have more than 150 eventTypes which i want to find them and get their counts in the tar files. tar files are more than 1700 file generated per day.

推荐答案

这里是GNU awk脚本(作为其自己的脚本文件,用于可重用性),用于解析事件类型并为日志文件提供错误代码并报告计数每个日期匹配的事件类型和错误代码.

Here is a GNU awk script (as its own script file, for reusability) that parses the event types and error codes the log file and reports the counts of matching event types and error codes for each date.

#!/usr/bin/awk -f

/^[0-9]+,[0-9]+$/ {
    # this line contains event type and error code

    split($0, data, ",");
    keys[data[1]][data[2]] = 0;
}

match($0, "EventType=([0-9]+).*ErrorCode=([0-9]+)", key) {
    # this line is from the log file

    if (key[1] in keys && key[2] in keys[key[1]]) {
        match($0, "OrigEventTime=([0-9-]+)", date);
        datecount[date[1]][key[1]][key[2]]++;
    }
}

END {
    for (d in datecount) {
        for (k1 in datecount[d]) {
            for (k2 in datecount[d][k1]) {
                printf("%s\t%s/%s\t%d\n",
                        d, k1, k2, datecount[d][k1][k2]);
            }
        }
    }
}

运行它(请注意,此需要GNU awk ):

Running it (note thot this requires GNU awk):

$ awk -f script.awk codes.txt run.log

您想要的格式的输出不太,但我希望它足够接近:

The output is not quite in the format that you wanted, but I'm hoping it's close enough:

2016-06-11  10008/4569  1
2016-06-21  10008/4569  4
2016-06-21  40000/4006  1

(我重复了几次您提供给我们的数据,并更改了日期以及事件类型和错误代码之一).

(I duplicated the data that you gave us a few times and change a date and one of the event types and error codes).

更新:我为低于4.0的GNU awk版本(不理解数组数组)重新编写了脚本:

UPDATE: I reworked the script for GNU awk versions older than 4.0 (that do not understand arrays of arrays):

#!/usr/bin/awk -f

/^[0-9]+,[0-9]+$/ {
    # this line contains event type and error code

    split($0, data, ",");
    keys[data[1],data[2]] = 1;
}

match($0, "EventType=([0-9]+).*ErrorCode=([0-9]+)", key) {
    # this line is from the log file

    if (keys[key[1],key[2]] == 1) {
        match($0, "OrigEventTime=([0-9-]+)", date);
        count[date[1],key[1],key[2]]++;
    }
}

END {
    for (comb in count) {
        split(comb, field, SUBSEP);
        printf("%s\t%s/%s\t%s\n", field[1], field[2], field[3], count[comb]);
    }
}

这篇关于Unix shell脚本在数千个文件中搜索错误代码,然后在文本文件中打印计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆