如何使用awk计算开始和结束日期的差额 [英] How to calculate the starting and end date difference using awk
问题描述
我需要在新字段上打印每个唯一ID($ 5)的记录开始日期和结束日期之间($ 6)的差额(天). 数据看起来像这样
I need to print the difference (in days) in ($6) between the starting and end date of records for each unique ID ($5) on a new field.
the data looks like this
7 65 2 5 32070 2010-12-14 13:25:30
7 82 2 10 41920 2010-12-14 11:30:45
7 83 1 67 29446 2010-12-14 04:15:25
7 81 1 47 32070 2011-5-11 08:14:20
7 83 1 67 29446 2011-6-22 07:13:24
7 82 2 10 41920 2011-5-14 06:15:25
我需要看到以下内容:
7 65 2 5 32070 2010-12-14 13:25:30 147
7 82 2 10 41920 2010-12-14 11:30:45 150
7 83 1 67 29446 2010-12-14 04:15:25 189
7 81 1 47 32070 2011-5-11 08:14:20 147
7 83 1 67 29446 2011-6-22 07:13:24 189
7 82 2 10 41920 2011-5-14 06:15:25 150
我使用了以下代码,但给了我错误消息.如果您还有其他选择,可以帮我吗?
I have used the following code but give me error message. could you help me if you have another option?
awk '{
split($6,arr,"-")
a=sprintf("%s %s %s 0 0 0",arr[1], arr[2], arr[3])
d=mktime(a)
delta[$5]=delta[$5] " " d
}
END {for(i in delta) {print i, delta[i]} }' filename > tmp.dat
awk '{
if (FILENAME=="tmp.dat" )
{
delta[$1]=$0;
next
}
if (FILENAME=="filename")
{
a="-1"
if($5 in delta)
{
cnt=split(delta[$5],arr)
if(cnt==3)
{
a=arr[3] - arr[2]
a/=86400
a=int(a)
}
}
print $0, a
next
}
}' tmp.dat filename
推荐答案
在awk中.源文件被读入两次.在第一次计算时差时,在第二条记录中输出时差.
In awk. Source file is read in twice. On the first go time difference is computed, on the second records are outputed with appended time differencies.
$ awk 'NR==FNR {
c = "date -d \""$6 "\" +%s"; # use system date for epoch time seconds
c | getline d; # execute command in c var, output to d
a[$5] = (($5 in a) ? d-a[$5] : d); # set or subtract from array
next # skip to next record
} { # for the second go:
# $1=$1; # uncomment to clean trailing space
print $0, int(a[$5]/86400) # print record and time difference
}' file file
7 65 2 5 32070 2010-12-14 13:25:30 147
7 82 2 10 41920 2010-12-14 11:30:45 150
7 83 1 67 29446 2010-12-14 04:15:25 189
7 81 1 47 32070 2011-5-11 08:14:20 147
7 83 1 67 29446 2011-6-22 07:13:24 189
7 82 2 10 41920 2011-5-14 06:15:25 150
时间差之前的间隔会有所不同,因为您的数据在$NF
之后具有尾随空格.您可以在print
之前用例如$1=$1;
修剪它.
The spacing before time difference varies because your data has trailing space after $NF
. You can trim it out with for example $1=$1;
before the print
.
编辑:它预计字段$5
中每个唯一ID 中只有2个.当找到ID的首次出现时,字段$6
中的日期(仅日期部分)将转换为秒并存储到数组a[$5]
中.当找到下一个时,将从后来发现的时间中减去存储到a[$5]
的时间,并将其存储到a[$5]
.如果从上次发现的时间中减去在a[$5]
中出现的唯一ID $5
时间超过2次,则会导致混乱.
EDIT: It expects that there are only 2 of each unique IDs in field $5
. When the first occurrance of an ID is found, the date in field $6
(and only the date part) is converted to seconds and stored to array a[$5]
. When the next one is found, the time stored to a[$5]
is subtracted from the later found time and stored to a[$5]
. If there are more than 2 occurrences of the unique ID $5
time in a[$5]
is subtracted from the last found time and resulting in chaos.
这篇关于如何使用awk计算开始和结束日期的差额的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!