在bash中的特定时间戳记期间文件中的行数 [英] Counting lines in a file during particular timestamps in bash

查看:62
本文介绍了在bash中的特定时间戳记期间文件中的行数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在计划每分钟运行一次的cron,并给出每分钟REJECT的字数.我的文件被连续记录,为避免冗余读取,我在运行脚本时使用tail -n + lastTimeWC存储上次读取的行.但是,我如何计算每分钟的拒绝次数.样本输入:

I am scheduling a cron that runs every minute and gives the word count of REJECT for every minute. My file is logged continuously and to avoid redundant read, I store the lines I read last time while running the script using tail -n +lastTimeWC. But how do I count number of REJECT per minute. sample input:

20170327-09:15:01.283619074 ResponseType:REJECT
20170327-09:15:01.287619074 ResponseType:REJECT
20170327-09:15:01.289619074 ResponseType:REJECT
20170327-09:15:01.290619074 ResponseType:REJECT
20170327-09:15:01.291619074 ResponseType:REJECT
20170327-09:15:01.295619074 ResponseType:REJECT
20170327-09:15:01.297619074 ResponseType:REJECT
20170327-09:16:02.283619074 ResponseType:REJECT
20170327-09:16:03.283619074 ResponseType:REJECT
20170327-09:17:02.283619074 ResponseType:REJECT
20170327-09:17:07.283619074 ResponseType:REJECT

预期输出:

9:15 REJECT 7
9:16 REJECT 2
9:17 REJECT 2

Update1 :(使用Ed Morton的答案)

Update1: (Using Ed Morton's answer)

#!/usr/bin/bash
while :
do
awk -F '[:-]' '{curr=$2":"$3} (prev!="") && (curr!=prev){print NR, prev, $NF, cnt; cnt=0} {cnt++; prev=curr}' $1
sleep 60
done

此脚本在60秒后持续为我提供输出.但是它只应给日志文件添加新的时间戳($!)假设添加了9:18,则应该开始将其包括在答案中(不再是9:15至9:18)

This script continuously gives me output after 60 seconds. But it should only give new timestamps added to the logfile ($!) Suppose 9:18 gets added, then it should just start including that to the answer (not 9:15 to 9:18 all again)

推荐答案

不打印最后一个计数,因为在该时间戳记下它可能不完整,只需打印之前的计数即可:

Don't print the last count since it may not be complete for that timestamp, just print the counts before that:

$ awk -F '[:-]' '{curr=$2":"$3} (prev!="") && (curr!=prev){print prev, cnt, $NF; cnt=0} {cnt++; prev=curr}' file
09:15 REJECT 7
09:16 REJECT 2

如果您确实也想打印最后一张,则只需在END部分中添加打印件即可:

If you really WANTED to print the last one too then just add a print in an END section:

$ awk -F '[:-]' '{curr=$2":"$3} (prev!="") && (curr!=prev){print prev, $NF, cnt; cnt=0} {cnt++; prev=curr} END{print prev, $NF, cnt}' file
09:15 REJECT 7
09:16 REJECT 2
09:17 REJECT 2

但是我想您无论如何都必须丢弃可能的部分结果,那又有什么意义呢?

but I'd imagine you have to just discard that possibly partial result anyway so what's the point?

请注意,您不必将所有结果存储在数组中,然后在END部分中打印它们,只需在每次时间戳更改时打印它们.除了不必要地使用内存外,将所有结果存储在数组中,然后使用in在END节中使用循环打印它们的解决方案将按随机(实际上是哈希)顺序而不是时间戳顺序打印输出.发生在您的输入中(除非有时运气不好).

Note that you don't have to store all the results in an array and then print them in the END section, just print them every time the timestamp changes. In addition to using memory unnecessarily, the solutions that store all of the results in an array and then print them with a loop in the END section using in will print the output in random (actually hash) order, not the order the timestamps occur in your input (unless by dumb luck sometimes).

而不是存储输入文件的行数(当时间戳结果在脚本调用之间分割时,这可能会导致错误的结果,并且由于时间太长,无法使用logrotate或类似方法截断日志文件)/old),存储分析的最后一个时间戳,然后在当前迭代之后开始,例如用cron做到这一点:

Rather than storing the line count of your input file (which can cause false results when a timestamp results are split across invocations of the script AND makes it impossible to use logrotate or similar to truncate your log file as it's getting long/old), store the last timestamp analyzed and start after that on the current iteration, e.g. do the equivalent of this with cron:

while :
do
    results=( $(awk -F '[:-]' -v last="$lastTimeStamp" '{curr=$2":"$3} curr<last{next} (prev!="") && (curr!=prev){print prev, $NF, cnt; cnt=0} {cnt++; prev=curr}' file) )
    numResults="${#results[@]}"
    if (( numResults > 0 ))
    then
        printf '%s\n' "${results[@]}"
        (( lastIndex = numResults - 1 ))
        lastResult="${results[$lastIndex]}"
        lastTimeStamp="${lastResult%% *}"
    fi
    sleep 60
done

或如果您想使用行号以便可以执行tail,而不是使用wc -l来获取文件的长度(该长度将包括当前时间戳,您将不会打印可能不完整的结果) awk在与每个时间戳关联的最后一行之后打印该行的行号:

or if you wanted to use line numbers so you can do tail then rather than using wc -l to get the length of the file (which would include the current timestamp you are not printing potentially incomplete results for), have awk print the line number of the line after the last line associated with each timestamp:

$ awk -F '[:-]' '{curr=$2":"$3} (prev!="") && (curr!=prev){print NR, prev, $NF, cnt; cnt=0} {cnt++; prev=curr}' file
8 09:15 REJECT 7
10 09:16 REJECT 2

并将其剥离以保存最后的值,然后再打印结果.最后一个值是您在下一次迭代中要做的tail -n +<startLineNr> | awk '...'.

and strip it off to save the last value before printing the result. That last value is what you'll do tail -n +<startLineNr> | awk '...' with next iteration.

您在示例输入中并未向我们显示此内容,但是如果您的日志文件中包含不包含REJECT的行,并且您希望忽略这些行,只需在awk脚本的开头添加$NF!="REJECT"{next}.

btw you didn't show us this in your sample input but if your log file contains lines that do not contain REJECT and you want those ignored, just add $NF!="REJECT"{next} at the start of the awk script.

这篇关于在bash中的特定时间戳记期间文件中的行数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆