如何使用awk计算开始和结束日期的差额 [英] How to calculate the starting and end date difference using awk

查看:122
本文介绍了如何使用awk计算开始和结束日期的差额的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在新字段上打印每个唯一ID($ 5)的记录开始日期和结束日期之间($ 6)的差额(天). 数据看起来像这样

I need to print the difference (in days) in ($6) between the starting and end date of records for each unique ID ($5) on a new field.
the data looks like this

7  65  2    5   32070  2010-12-14    13:25:30    
7  82  2    10  41920  2010-12-14    11:30:45  
7  83  1    67  29446  2010-12-14    04:15:25      
7  81  1    47  32070  2011-5-11     08:14:20  
7  83  1    67  29446  2011-6-22     07:13:24
7  82  2    10  41920  2011-5-14     06:15:25  

我需要看到以下内容:

7  65  2    5   32070  2010-12-14    13:25:30   147  
7  82  2    10  41920  2010-12-14    11:30:45   150  
7  83  1    67  29446  2010-12-14    04:15:25   189  
7  81  1    47  32070  2011-5-11     08:14:20   147  
7  83  1    67  29446  2011-6-22     07:13:24   189  
7  82  2    10  41920  2011-5-14     06:15:25   150 

我使用了以下代码,但给了我错误消息.如果您还有其他选择,可以帮我吗?

I have used the following code but give me error message. could you help me if you have another option?

awk '{  
       split($6,arr,"-")  
      a=sprintf("%s %s %s 0 0 0",arr[1], arr[2], arr[3])  
      d=mktime(a)    
      delta[$5]=delta[$5] " " d  
     }   
   END {for(i in delta) {print i, delta[i]}  }'  filename > tmp.dat  

awk '{  
     if (FILENAME=="tmp.dat" )  
     {   
       delta[$1]=$0;   
       next  
     }  
     if (FILENAME=="filename")  
     {   
       a="-1"  
       if($5 in delta)  
      {  
        cnt=split(delta[$5],arr)  
       if(cnt==3)  
       {  
         a=arr[3] - arr[2]  
         a/=86400  
         a=int(a)  
       }  
       }  
      print $0, a        
      next  
       }  
        }' tmp.dat filename     

推荐答案

在awk中.源文件被读入两次.在第一次计算时差时,在第二条记录中输出时差.

In awk. Source file is read in twice. On the first go time difference is computed, on the second records are outputed with appended time differencies.

$ awk 'NR==FNR {
           c = "date -d \""$6 "\" +%s";   # use system date for epoch time seconds
           c | getline d;                 # execute command in c var, output to d 
           a[$5] = (($5 in a) ? d-a[$5] : d); # set or subtract from array
           next                           # skip to next record
       } {                                # for the second go:
           # $1=$1;                       # uncomment to clean trailing space
           print $0, int(a[$5]/86400)     # print record and time  difference
       }' file file
7  65  2    5   32070  2010-12-14    13:25:30     147
7  82  2    10  41920  2010-12-14    11:30:45   150
7  83  1    67  29446  2010-12-14    04:15:25       189
7  81  1    47  32070  2011-5-11     08:14:20   147
7  83  1    67  29446  2011-6-22     07:13:24 189
7  82  2    10  41920  2011-5-14     06:15:25   150

时间差之前的间隔会有所不同,因为您的数据在$NF之后具有尾随空格.您可以在print之前用例如$1=$1;修剪它.

The spacing before time difference varies because your data has trailing space after $NF. You can trim it out with for example $1=$1; before the print.

编辑:它预计字段$5中每个唯一ID 中只有2个.当找到ID的首次出现时,字段$6中的日期(仅日期部分)将转换为秒并存储到数组a[$5]中.当找到下一个时,将从后来发现的时间中减去存储到a[$5]的时间,并将其存储到a[$5].如果从上次发现的时间中减去在a[$5]中出现的唯一ID $5 时间超过2次,则会导致混乱.

EDIT: It expects that there are only 2 of each unique IDs in field $5. When the first occurrance of an ID is found, the date in field $6 (and only the date part) is converted to seconds and stored to array a[$5]. When the next one is found, the time stored to a[$5] is subtracted from the later found time and stored to a[$5]. If there are more than 2 occurrences of the unique ID $5 time in a[$5] is subtracted from the last found time and resulting in chaos.

这篇关于如何使用awk计算开始和结束日期的差额的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆