使用bash中连续添加的日志文件 [英] Working with continuously appended log file in bash

查看:90
本文介绍了使用bash中连续添加的日志文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个连续添加的日志文件,如下所示.我的目的是每分钟查找一次拒绝计数.微小的cron无法以恰好在9:15:00运行的方式进行同步,这使这个问题变得棘手.

I have a continuously appended logfile like below. My intent is to find REJECT counts every minute. A minutely cron cannot be synchronized in a way thatit runs at exactly 9:15:00 is what creates this problem tricky.

20160302-09:15:01.283619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:15:01.287619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:15:01.289619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:15:01.290619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:15:01.291619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:15:01.295619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:15:01.297619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:16:02.283619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:16:03.283619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:17:02.283619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:17:07.283619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:18:07.283619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:19:06.283619074 ResponseType:ACCEPT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:10|TimeStamp:1490586301283617274
20160302-09:19:07.283619074 ResponseType:ACCEPT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:10|TimeStamp:1490586301283617274
20160302-09:19:07.283619075 ResponseType:ACCEPT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:10|TimeStamp:1490586301283617274
20160302-09:20:07.283619075 ResponseType:ACCEPT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:10|TimeStamp:1490586301283617274
20160302-09:21:07.283619075 ResponseType:ACCEPT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:10|TimeStamp:1490586301283617274
20160302-09:22:07.283619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274

预期输出

8 9:15 REJECT 7
10 9:16 REJECT 2
12 9:17 REJECT 2
#continue appending every 2 minutes or so by reading more logged lines eg. 9:19 then 9:22 and so on

我的代码:

#!/usr/bin/bash
today=$(date +%Y%m%d)
touch ${today}.log
counter=$(tail -1 ${today}.log | cut -d" " -f1) #get line number to tail from

re='^[0-9]+$'
if ! [[ $counter =~ $re ]] ; then
   counter=1
fi
echo "$counter"

echo "tail -n +$counter $1 | grep "ErrorCode:100107" |cut -d'|' -f1 | awk -F '[:-]' '{curr=$2":"$3} (prev!="") && (curr!=prev){print NR, prev, $NF, cnt; cnt=0} {cnt++; prev=curr}' >> ${today}.log"
tail -n +$counter $1 | grep "ErrorCode:100107" |cut -d'|' -f1 | awk -F '[:-]' '{curr=$2":"$3} (prev!="") && (curr!=prev){print NR, prev, $NF, cnt; cnt=0} {cnt++; prev=curr}' >> ${today}.log

我打算每2或3分钟运行一次此脚本(现在可以进行延迟日志记录).但是不想每次都重新读取整个文件(文件将以GB为单位),因此我尝试使用行号和仅尾部添加行.但这失败了,因为在第二次迭代期间,行号被重置.有一种方法只能在新添加的行上工作.

I intend to run this script every 2 or 3 minutes (delayed logging is okay for now). But dont want to re-read the whole file every time(file will be in GBs), so I tried using line number and tail only added line. But this fails as during the second iteration, line numbers are reset. Is there a way to work only on newly added lines.

Update1: 生成连续的日志文件

Update1: To generate continuous logfile

while :; do echo "$(date +%Y%m%d-%H:%M:%S).21223 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274";sleep $(shuf -i 5-20 -n 1);done >> test_log &

我以test_log作为参数./myscript.sh test_log运行脚本 myscript.sh的内容

I run script with test_log as argument ./myscript.sh test_log Content of myscript.sh

#!/usr/bin/bash

inputfile=$1

tail -f $inputfile | grep "ErrorCode:100107" |cut -d'|' -f1 | awk '/RMS_REJECT/{key=$2":"$3;a[key]++;if (LK && LK != key){print LK,a[LK]+1;delete a};LK=key}' FS='[-:]' #does not give output on teminal
# however this works=> cat $inputfile | grep "ErrorCode:100107" |cut -d'|' -f1 | awk '/RMS_REJECT/{key=$2":"$3;a[key]++;if (LK && LK != key){print LK,a[LK]+1;delete a};LK=key}' FS='[-:]'

这某种程度上不能使我在终端上连续输出

This somehow does not give me continuous output on terminal

参考: 在bash中的特定时间戳记期间对文件中的行进行计数

推荐答案

gawk将在每次时间戳迭代中更新您的结果 :

This gawk will update your results on each timestamp iteration:

tail -f your_log_file|gawk '/RMS_REJECT/{key=$2":"$3;a[key]++;if (LK && LK != key){print LK,a[LK];delete a[LK]};LK=key}' FS='[-:]'

简单且最小的内存开销.

Simple and minimal memory overhead.

说明

FS='[-:]':使用regexF ield S分隔符设置为-:.

FS='[-:]': The Field Separator is set to - and : using a regex.

/RMS_REJECT/:仅考虑拒绝的记录,因此我们寻找适当的模式,当行中不存在时不执行.

/RMS_REJECT/: Only rejected records are taken in consideration, so we look for the appropriated pattern, and do nothing when not present in the line.

key=$2":"$3:根据第二和第三字段组成一个key.

key=$2":"$3: Composes a key based in second and third field.

a[key]++:将密钥存储在关联数组(哈希类型)中,并累积显示该密钥的次数.

a[key]++: Stores the key in an associate array (kind of hash) and accumulates the number of times that key is seen.

我们将暂时跳过if部分的说明.

LK=key:LK表示 Last Key ,因此我们使用此变量存储处理过的 last 密钥.

LK=key: LK stands for Last Key, so we use this variable to store the last key processed.

if (LK && LK != key){print LK,a[LK];delete a[LK]}:当我们的 Last Key 具有一个值并且其内容与当前值(key var)不同时,请打印 Last Key 及其哈希值并从数组中删除键,以避免内存开销.

if (LK && LK != key){print LK,a[LK];delete a[LK]}: When our Last Key has a value and its content differs from the current one (key var), print the Last Key and its hash value and delete de key from the array to avoid memory overhead.

示例

[klashxx]$ while :; do echo "$(date +%Y%m%d-%H:%M:%S).21223 ResponseType:RMS_REJECT";sleep $(shuf -i 5-20 -n 1);done >> test_log &
[1] 4040
[klashxx]$ tail -f test_log |gawk '/RMS_REJECT/{key=$2":"$3;a[key]++;if (LK && LK != key){print LK,a[LK];delete a[LK]};LK=key}' FS='[-:]'
09:22 6
09:23 5
09:24 6
09:25 3
^C 

(按下Control + C)

[klashxx]$ cat test_log 
20170405-09:22:13.21223 ResponseType:RMS_REJECT
20170405-09:22:20.21223 ResponseType:RMS_REJECT
20170405-09:22:29.21223 ResponseType:RMS_REJECT
20170405-09:22:44.21223 ResponseType:RMS_REJECT
20170405-09:22:53.21223 ResponseType:RMS_REJECT
20170405-09:23:07.21223 ResponseType:RMS_REJECT
20170405-09:23:22.21223 ResponseType:RMS_REJECT
20170405-09:23:38.21223 ResponseType:RMS_REJECT
20170405-09:23:45.21223 ResponseType:RMS_REJECT
20170405-09:23:55.21223 ResponseType:RMS_REJECT
20170405-09:24:05.21223 ResponseType:RMS_REJECT
20170405-09:24:16.21223 ResponseType:RMS_REJECT
20170405-09:24:24.21223 ResponseType:RMS_REJECT
20170405-09:24:31.21223 ResponseType:RMS_REJECT
20170405-09:24:43.21223 ResponseType:RMS_REJECT
20170405-09:24:56.21223 ResponseType:RMS_REJECT
20170405-09:25:13.21223 ResponseType:RMS_REJECT
20170405-09:25:23.21223 ResponseType:RMS_REJECT
20170405-09:25:43.21223 ResponseType:RMS_REJECT
20170405-09:26:01.21223 ResponseType:RMS_REJECT


PS

不要grepcut,只需使用gawk选中此

Don't grep and cut, just use gawk check this answer.

PS2

根据您的代码,最终的外观应如下所示:

Based on your code, the final thing should look this way:

#!/usr/bin/bash

inputfile=$1

tail -f $inputfile | gawk -v errorcode="100107" '/RMS_REJECT/ && $20 == errorcode{key=$2":"$3;a[key]++;if (LK && LK != key){print LK,a[LK];delete a[LK]};LK=key}' FS='[-:|]'

这篇关于使用bash中连续添加的日志文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆