如何在gawk中将日期字符串转换为时间戳? [英] How to convert a date string to timestamp in gawk?

查看:120
本文介绍了如何在gawk中将日期字符串转换为时间戳?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在扫描格式如下的日志文件:

I am scanning through a log file formatted like this:

76.69.120.244 - - [09/Jun/2015:17:13:18 -0700] "GET /file.jpg HTTP/1.1" 200 22977 "http://example.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.124 Safari/537.36" "16543" "ewr1" "0.002" "CA" "Bell Canada" "2"
76.69.120.244 - - [09/Jun/2015:17:13:19 -0700] "GET /differentfile.bin HTTP/1.1" 206 453684 "http://example.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.124 Safari/537.36" "16543" "ewr1" "1.067" "CA" "Bell Canada" "2"

在gawk内部,我正在使用以下命令获取请求时间:

Inside gawk, I'm getting that request time using:

requesttime=$4;

将其解析为基于UTC/GMT的时间(最好是纪元时间戳)的最佳方法是什么?

What's the best way for me to parse that into a UTC/GMT based time, preferably an epoch timestamp?

如果有帮助的话,我至少可以保证它会在-0700开始;也许是某种丑陋的字符串转换来添加这7个小时呢?

I can at least always guarantee that it will be in -0700 if that helps; perhaps some kind of ugly string transformation to add those 7 hours on to it?

推荐答案

这将完成将您的日期和时间(忽略-0700)转换为自当前语言环境的纪元以来的秒数的主要部分:

This will do the main part of converting your date+time (it ignores the -0700) to a number of secs since the epoch for your current locale:

$ cat tst.awk
BEGIN { FS="[][]" }
{
    split($2,a,"[/: ]")
    match("JanFebMarAprMayJunJulAugSepOctNovDec",a[2])
    a[2] = sprintf("%02d",(RSTART+2)/3)
    secs = mktime(a[3]" "a[2]" "a[1]" "a[4]" "a[5]" "a[6])
    print $2, "->", secs
}

$ awk -f tst.awk file
09/Jun/2015:17:13:18 -0700 -> 1433887998
09/Jun/2015:17:13:19 -0700 -> 1433887999

,然后您就可以在秒上做一些数学运算,或者在调用awk之前适当地设置TZ变量,例如(如果这是可用于您的数据/语言环境的正确TZ,则为idk):

and then you can either do some math on the secs or set the TZ variable appropriately before calling awk, e.g. (idk if this is the right TZ to use for your data/locale or not):

$ TZ=UTC awk -f tst.awk file
09/Jun/2015:17:13:18 -0700 -> 1433869998
09/Jun/2015:17:13:19 -0700 -> 1433869999

您可以使用strftime("%z")来获取当前语言环境的时区偏移量:

You can get your current locales time zone offset with strftime("%z"):

$ awk 'BEGIN{print strftime("%z")}'
-0500

所以包含偏移量计算的最终解决方案可能是或包含(检查数学,因为您没有显示期望的输出是什么,而且我可能会误解您的数据对您意味着什么!):

so your final solution that includes the offset calculation might be or include (check the math as you didn't show what your expected output is and I might be misinterpreting what your data means to you!):

$ cat tst.awk
BEGIN {
    FS="[][]"
    locOffset = strftime("%z")
}
{
    split($2,a,"[/: ]")
    match("JanFebMarAprMayJunJulAugSepOctNovDec",a[2])
    a[2] = sprintf("%02d",(RSTART+2)/3)
    secs = mktime(a[3]" "a[2]" "a[1]" "a[4]" "a[5]" "a[6])
    secs = secs + (locOffset - a[7]) * 60 * 60
    print $2, "->", secs
}

$ awk -f tst.awk file
09/Jun/2015:17:13:18 -0700 -> 1434607998
09/Jun/2015:17:13:19 -0700 -> 1434607999

或者如果您喜欢简洁和困惑(;-)):

or if you like brevity and puzzles ( ;-) ):

$ cat tst.awk
BEGIN { FS="[][]" }
{
    split($2,a,"[/: ]")
    print $2, "->", mktime(a[3]" "(match("JanFebMarAprMayJunJulAugSepOctNovDec",a[2])+2)/3" "a[1]" "a[4]" "a[5]" "a[6]) + (strftime("%z") - a[7])*60*60
}

$ awk -f tst.awk file
09/Jun/2015:17:13:18 -0700 -> 1434607998
09/Jun/2015:17:13:19 -0700 -> 1434607999

这篇关于如何在gawk中将日期字符串转换为时间戳?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆