计算基于时间的度量(每小时) [英] Calculate time based metrics(hourly)

查看:135
本文介绍了计算基于时间的度量(每小时)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将如何计算基于日志文件数据基于时间的指标(小时平均)?

How would I calculate time-based metrics (hourly average) based on log file data?

让我这个更清晰,考虑包含的条目如下日志文件:每个UID的出现只有两次在日志中。他们将在嵌入式XML格式。他们将有可能出现乱序。和日志文件将数据只为一个日子,所以只用了一天的记录将在那里。

let me make this more clear, consider a log file that contains entries as follows: Each UIDs appears only twice in the log. they will be in embeded xml format. And they will likely appear OUT of sequence. And the log file will have data for only one day so only one day records will be there.

没有的UID是2百万的日志文件中。

No of UIDs are 2 millions in log file.

我必须找出每小时平均响应时间为这些请求。下面有请求,并在日志文件中的反应。 UID是关联的B / W请求和响应的关键。

I have to find out average hourly reponse time for these requests. Below has request and response in log file. UID is the key to relate b/w request and response.

2013-04-03 08:54:19,451 INFO  [Logger] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><log-event><message-time>2013-04-03T08:54:19.448-04:00</message-time><caller>PCMC.common.manage.springUtil</caller><body>&lt;log-message-body&gt;&lt;headers&gt;&amp;lt;FedDKPLoggingContext id="DKP_DumpDocumentProperties" type="context.generated.FedDKPLoggingContext"&amp;gt;&amp;lt;logFilter&amp;gt;7&amp;lt;/logFilter&amp;gt;&amp;lt;logSeverity&amp;gt;255&amp;lt;/logSeverity&amp;gt;&amp;lt;schemaType&amp;gt;PCMC.MRP.DocumentMetaData&amp;lt;/schemaType&amp;gt;&amp;lt;UID&amp;gt;073104c-4e-4ce-bda-694344ee62&amp;lt;/UID&amp;gt;&amp;lt;consumerSystemId&amp;gt;JTR&amp;lt;/consumerSystemId&amp;gt;&amp;lt;consumerLogin&amp;gt;jbserviceid&amp;lt;/consumerLogin&amp;gt;&amp;lt;logLocation&amp;gt;Beginning of Service&amp;lt;/logLocation&amp;gt;&amp;lt;/fedDKPLoggingContext&amp;gt;&lt;/headers&gt;&lt;payload&gt;  
&amp;lt;ratedDocument&amp;gt;  
    &amp;lt;objectType&amp;gt;OLB_BBrecords&amp;lt;/objectType&amp;gt;  
    &amp;lt;provider&amp;gt;JET&amp;lt;/provider&amp;gt;  
    &amp;lt;metadata&amp;gt;&amp;amp;lt;BooleanQuery&amp;amp;gt;&amp;amp;lt;Clause   occurs=&amp;amp;quot;must&amp;amp;quot;&amp;amp;gt;&amp;amp;lt;TermQuery   fieldName=&amp;amp;quot;RegistrationNumber&amp;amp;quot;&amp;amp;gt;44565153050735751&amp;amp;lt;/TermQuery&amp;amp;gt;&amp;amp;lt;/Clause&amp;amp;gt;&amp;amp;lt;/BooleanQuery&amp;amp;gt;&amp;lt;/metadata&amp;gt;  
&amp;lt;/ratedDocument&amp;gt;  
&lt;/payload&gt;&lt;/log-message-body&gt;</body></log-event>

2013-04-03 08:54:19,989 INFO  [Logger] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><log-event><message-time>2013-04-03T08:54:19.987-04:00</message-time><caller>PCMC.common.manage.springUtil</caller><body>&lt;log-message-body&gt;&lt;headers&gt;&amp;lt;fedDKPLoggingContext id="DKP_DumpDocumentProperties" type="context.generated.FedDKPLoggingContext"&amp;gt;&amp;lt;logFilter&amp;gt;7&amp;lt;/logFilter&amp;gt;&amp;lt;logSeverity&amp;gt;255&amp;lt;/logSeverity&amp;gt;&amp;lt;schemaType&amp;gt;PCMC.MRP.DocumentMetaData&amp;lt;/schemaType&amp;gt;&amp;lt;UID&amp;gt;073104c-4e-4ce-bda-694344ee62&amp;lt;/UID&amp;gt;&amp;lt;consumerSystemId&amp;gt;JTR&amp;lt;/consumerSystemId&amp;gt;&amp;lt;consumerLogin&amp;gt;jbserviceid&amp;lt;/consumerLogin&amp;gt;&amp;lt;logLocation&amp;gt;Successful Completion of    Service&amp;lt;/logLocation&amp;gt;&amp;lt;/fedDKPLoggingContext&amp;gt;&lt;/headers&gt;&lt;payload&gt;0&lt;/payload&gt;&lt;/log-message-body&gt;</body></log-event>

这里是bash脚本我写的。

here is the bash script I wrote.

uids=cat $i|grep "Service" |awk 'BEGIN {FS="lt;";RS ="gt;"} {print $2;}'| sort -u
for uid in ${uids}; do  
    count=`grep "$uid" test.log|wc -l`
    if [ "${count}" -ne "0" ]; then
        unique_uids[counter]="$uid"
        let counter=counter+1   
    fi   
done


echo ${unique_uids[@]}   
echo $counter  
echo " Unique No:" ${#unique_uids[@]}
echo uid StartTime EndTime" > $log

for unique_uids in ${unique_uids[@]} ; do
    responseTime=`cat $i|grep "${unique_uids}" |awk '{split($2,Arr,":|,"); print Arr[1]*3600000+Arr[2]*60000+Arr[3]*1000+Arr[4]}'|sort -n`
    echo $unique_uids $responseTime >> $log
done

和输出应该是这样的
操作来自ID,消费者来自documentmetadata和小时为时间08:54:XX
因此,如果我们有一个以上的请求和响应,那么需要的平均时间来请求在那个时刻的响应。

And the output should be like this Operation comes from id, Consumer comes from documentmetadata and hour is the time 08:54:XX So if we have more than one request and response then need to average of the response times for requests came at that hour.

操作消费者HOUR平均响应时间(毫秒)搜索
DKP_DumpDocumentProperties MRP 08 538

Operation Consumer HOUR Avg-response-time(ms)
DKP_DumpDocumentProperties MRP 08 538

推荐答案

由于您发布的输入文件:

Given your posted input file:

$ cat file
2013-04-03 08:54:19,989 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&amp;lt;UId&amp;gt;904c-be-4e-bbda-3e62&amp;lt;/UId&amp;gt;&amp;lt;</body></event>
2013-04-03 08:54:39,389 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&amp;lt;UId&amp;gt;904c-be-4e-bbda-3e62&amp;lt;/UId&amp;gt;&amp;lt;</body></event>
2013-04-03 08:54:34,979 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&amp;lt;UId&amp;gt;edfc-fr-5e-bced-3443&amp;lt;/UId&amp;gt;&amp;lt;</body></event>
2013-04-03 08:55:19,569 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&amp;lt;UId&amp;gt;edfc-fr-5e-bced-3443&amp;lt;/UId&amp;gt;&amp;lt;</body></event>

本GNU awk脚本(您正在使用GNU awk的,因为你设置RS在你在你的问题张贴的脚本多字符串)

This GNU awk script (you are using GNU awk since you set RS to a multi-character string in the script you posted in your question)

$ cat tst.awk
{
    date = $1
    time = $2
    guid = gensub(/.*;gt;([^&]+).*/,"\\1","")

    print guid, date, time
}

将退出我认为是你关心的信息:

will pull out what I THINK is the information you care about:

$ gawk -f tst.awk file
904c-be-4e-bbda-3e62 2013-04-03 08:54:19,989
904c-be-4e-bbda-3e62 2013-04-03 08:54:39,389
edfc-fr-5e-bced-3443 2013-04-03 08:54:34,979
edfc-fr-5e-bced-3443 2013-04-03 08:55:19,569

剩下的就是简单的数学,对不对?并做到这一点在这个awk脚本 - 不要去awk的输出通过管道向一些愚蠢的外壳环

The rest is simple math, right? And do it in this awk script - don't go piping the awk output to some goofy shell loop!

这篇关于计算基于时间的度量(每小时)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆