使用Shell脚本在日志文件中提取具有自己时间戳的不可预测的数据 [英] Extract the unpredictable data that have its own timestamp in a log file using a Shell script

查看:109
本文介绍了使用Shell脚本在日志文件中提取具有自己时间戳的不可预测的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

log.txt 将如下所示,它们是具有自己时间戳记(detection_time)的ID数据,该数据将在此log.txt文件中不断更新.ID数据将是不可预测的数字.可能是0000-9999,并且相同的ID可能再次出现在log.txt中.

我的目标是使用shell脚本过滤从首次出现后15秒内再次出现在 log.txt 中的ID.有人可以帮我吗?

  ID = 4231detection_time = 1595556730编号= 3661detection_time = 1595556731ID = 2654detection_time = 1595556732编号= 3661detection_time = 1595556733 

更清楚地说,从上面的 log.txt 中,ID 3661首先在时间1595556731出现,然后在首次出现后仅2秒的1595556733处再次出现.因此它与我要匹配的条件匹配,希望ID在15秒内再次出现.我希望此ID 3661被我的shell脚本过滤

运行shell脚本后的输出将是 ID = 3661

我的问题是我不知道如何在shell脚本中开发编程算法.

这里我使用 ID_new ID_previous 变量尝试执行的操作,但是 ID_previous = $(ID_new)detection_previous = $(detection_new)不起作用

  input ="/tmp/log.txt";ID_previous ="detection_previous ="而IFS =读-r行做ID_new = $(echo"$ line" | grep"ID =" | awk -F"'{print $ 3}')回声$ ID_newdetection_new = $(回显"$ line" | grep"detection_time =" | awk -F'{print $ 3}")回声$ detection_newID_previous = $(ID_new)detection_previous = $(detection_new)完成<"$ input" 

编辑 log.txt 实际上数据位于包含ID,detection_time,Age和Height的集合中.抱歉首先没有提及

  ID = 4231detection_time = 1595556730年龄= 25身高= 182编号= 3661detection_time = 1595556731年龄= 24身高= 182ID = 2654detection_time = 1595556732年龄= 22身高= 184编号= 3661detection_time = 1595556733年龄= 27身高= 175ID = 3852detection_time = 1595556734年龄= 26身高= 156ID = 4231detection_time = 1595556735年龄= 24身高= 184 

我已经尝试过Awk解决方案.结果是 4231 3661 2654 3852 4231 正确的输出应为 4231 3661

据此,我认为年龄"和身高"数据可能会影响Awk解决方案,因为它插入了ID和detection_time等重点数据之间.

解决方案

假定日志文件中的时间戳是单调增加的,则只需通过Awk一次即可.对于每个 id ,请跟踪最新的报告时间(使用关联数组 t ,其中键为 id ,其值为最新时间戳记).如果您再次看到相同的 id ,并且时间戳之间的差异小于15,请进行报告.

为保持良好状态,请保留我们已报告的数组的第二个数组 p ,这样我们就不会重复报告它们了.

  awk'/^ ID =/{id = $ 3;下一个 }#如果此行既不是ID也不是detection_time,则跳过!/^ detection_time =/{下一个}(id in t)&&(t [id]> = $ 3-15)&&!(p [id]){打印ID;++ p [id];下一个 }{t [id] = $ 3}'/tmp/log.txt 

如果您真的坚持要在Bash中本地执行此操作,那么我会重构您的尝试

 声明-dtime打印而读取-r字段_值做案例$ field inID)id = $ value ;;detection_time)如果[[dtime ["$ id"] -ge $((value-15))]];然后[[-v打印["$ id"]]] ||回声"$ id"打印["$ id"] = 1科幻dtime ["$ id"] = $ value ;;埃萨克完成</tmp/log.txt 

请注意,只要您知道可以期待多少个字段, read -r 就能像Awk一样轻松地在空格上分割行.但是同时读取-r 通常比Awk慢一个数量级,并且您必须同意Awk的尝试更简洁,优雅,并且可以移植到较旧的系统中.

(在Bash 4中引入了关联数组.)

切线地,任何看起来像 grep'x'|awk'{y}'可以重构为 awk'/x/{y}';另请参见 grep 的无用使用.

此外,请注意, $(foo)试图作为命令运行 foo .为了简单地引用变量 foo 的值,语法为 $ foo (或可选的 $ {foo} ,但括号在此处添加任何值).通常,您需要对扩展"$$ foo" 进行双引号;另请参见何时将引号括在shell变量

您的脚本只会记住一个更早的事件;关联数组使我们能够记住以前看到的所有 ID 值(直到内存用完为止).

也没有什么可以阻止我们在Awk中使用人类可读的变量名;可以随时用 printed 代替 p ,用 dtime 代替 t ,以与Bash替代品完全相同.

log.txt will be as below, which are the ID data with its own timestamp (detection_time) that will continuously update in this log.txt file. The ID data will be unpredictable number. It could be from 0000-9999 and the same ID could be appeared in the log.txt again.

My goal is to filter the ID that appears again in the log.txt within 15 sec from its first appearance by using shell script. Can anyone help me with this?

ID = 4231
detection_time = 1595556730 
ID = 3661
detection_time = 1595556731
ID = 2654
detection_time = 1595556732
ID = 3661
detection_time = 1595556733

To be more clear, from log.txt above, the ID 3661 first appear at time 1595556731 and then appear again at 1595556733 which is just 2 sec after the first appearance. So it is matched to my condition which is want the ID that appear again within 15sec. I would like this ID 3661 to be filtered by my shell script

The output after running the shell script will be ID = 3661

My problem is I don't know how to develop the programming algorithm in shell script.

Heres what i try by using ID_new and ID_previous variable but ID_previous=$(ID_new) detection_previous=$(detection_new) are not working

input="/tmp/log.txt"
ID_previous=""
detection_previous=""
while IFS= read -r line
do
    ID_new=$(echo "$line" | grep "ID =" | awk -F " " '{print $3}')
    echo $ID_new
    detection_new=$(echo "$line" | grep "detection_time =" | awk -F " " '{print $3}')
    echo $detection_new
    ID_previous=$(ID_new)
    detection_previous=$(detection_new)
done < "$input"

EDIT log.txt actually the data is in a set contain ID, detection_time, Age and Height. Sorry for not mention this in the first place

ID = 4231
detection_time = 1595556730 
Age = 25
Height = 182
ID = 3661
detection_time = 1595556731
Age = 24
Height = 182
ID = 2654
detection_time = 1595556732
Age = 22
Height = 184    
ID = 3661
detection_time = 1595556733
Age = 27
Height = 175
ID = 3852
detection_time = 1595556734
Age = 26
Height = 156
ID = 4231
detection_time = 1595556735 
Age = 24
Height = 184

I've tried the Awk solution. the result is 4231 3661 2654 3852 4231 which are all the IDs in the log.txt The correct output should be 4231 3661

From this, I think Age and Height data might affect to the Awk solution because its inserted between the focused data which are ID and detection_time.

解决方案

Assuming the time stamps in the log file are increasing monotonically, you only need a single pass with Awk. For each id, keep track of the latest time it was reported (use an associative array t where the key is the id and the value is the latest timestamp). If you see the same id again and the difference between the time stamps is less than 15, report it.

For good measure, keep a second array p of the ones we have already reported so we don't report them twice.

awk '/^ID = / { id=$3; next }
    # Skip if this line is neither ID nor detection_time
    !/^detection_time = / { next }
    (id in t) && (t[id] >= $3-15) && !(p[id]) { print id; ++p[id]; next }
    { t[id] = $3 }' /tmp/log.txt

If you really insist on doing this natively in Bash, I would refactor your attempt to

declare -A dtime printed
while read -r field _ value
do
    case $field in
     ID) id=$value;;
     detection_time)
      if [[ dtime["$id"] -ge $((value - 15)) ]]; then
          [[ -v printed["$id"] ]] || echo "$id"
          printed["$id"]=1
      fi
      dtime["$id"]=$value ;;
    esac
done < /tmp/log.txt

Notice how read -r can easily split a line on whitespace just as well as Awk can, as long as you know how many fields you can expect. But while read -r is typically an order of magnitude slower than Awk, and you'll have to agree that the Awk attempt is more succinct and elegant, as well as portable to older systems.

(Associative arrays were introduced in Bash 4.)

Tangentially, anything that looks like grep 'x' | awk '{ y }' can be refactored to awk '/x/ { y }'; see also useless use of grep.

Also, notice that $(foo) attempts to run foo as a command. To simply refer to the value of the variable foo, the syntax is $foo (or, optionally, ${foo}, but the braces add no value here). Usually you will want to double-quote the expansion "$foo"; see also When to wrap quotes around a shell variable

Your script would only remember a single earlier event; the associative array allows us to remember all the ID values we have seen previously (until we run out of memory).

Nothing prevents us from using human-readable variable names in Awk either; feel free to substitute printed for p and dtime for t to have complete parity with the Bash alternative.

这篇关于使用Shell脚本在日志文件中提取具有自己时间戳的不可预测的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆