格式和过滤器文件到CSV表 [英] format and filter file to Csv table

查看:156
本文介绍了格式和过滤器文件到CSV表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含许多日志文件:

诗:这个问题是从previous问题启发>。但略有改善。

在10:00卡尔1 STR0 STR1 STR2 STR3< STR4&STR5 GT; [STR6 STR7] STR8:
学院/ course1:oftheory:SMTGHO:什么:
学院/ course1:ofapplicaton:SMTGHP:1小时:10:00卡尔2 STR0 STR1 STR2 STR3< STR4 STR78> [STR6 STR111] STR8:
学院/ course2:oftheory:SMTGHM:数学:
学院/ course2:ofapplicaton:SMTGHN:2小时:10:00大卫1 STR0 STR1 STR2 STR3< STR4 STR758> [STR6 STR155] STR8:
学院/ course3:oftheory:SMTGHK:地理:
学院/ course3:ofapplicaton:SMTGHL:halfhour:10:00大卫2 STR0 STR1 STR2 STR3< STR4 STR87> [STR6 STR74] STR8:
学院/ course4:oftheory:SMTGH:SMTGHI:历史:
学院/ course4:ofapplicaton:SMTGHJ:什么:14:00卡尔1 STR0 STR1 STR2 STR3< STR4 STR11> [STR6 STR784] STR8:
学院/ course5:oftheory:SMTGHG:什么:
学院/ course5:ofapplicaton:SMTGHH:2小时:14:00卡尔2 STR0 STR1 STR2 STR3< STR4 STR86> [STR6 STR85] STR8:
学院/ course6:oftheory:SMTGHE:音乐:
学院/ course6:ofapplicaton:SMTGHF:2小时:14:00大卫1 STR0 STR1 STR2 STR3< STR4 STR96> [STR6 STR01] STR8:
学院/ course7:oftheory:SMTGHC:programmation:
学院/ course7:ofapplicaton:SMTGHD:1小时:14:00大卫2 STR0 STR1 STR2 STR3< STR4 STR335> [STR6 STR66] STR8:
学院/ course8:oftheory:SMTGHA:理念:
学院/ course8:ofapplicaton:SMTGHB:什么:

我曾尝试申请以下,但在白白code:

BEGIN {
    用空行分隔#记录集
    RS =
    通过换行分隔#组字段,每个记录有3场
    FS =\\ n
}
{
    #删除记录的每个第一线部分意外
    子(在,,$ 1)
    #现在剩下的存储时间和过程
    时间= $ 1
    当然= $ 1
    #从字符串中删除的时间来提取课程名称
    子(^ [^] *,,当然)
    #删除课程名称检索字符串时
    子(当然,,时间)
    #得到每个记录的第二线理论信息
    子(课程:理论:,,$ 2)
    #得到三线应用程序信息
    子(课程:一个应用,,$ 3)
    #如果新课程
    如果(!(以头课程)){
        #保存标头信息(输出每行的第一个词)
        头[当然] =当然
        理论[当然] =理论
        应用[当然] =应用程序
    }
    #追加相关信息,以输出字符串
    头[当然] =标题[当然],时间
    理论[当然] =理论[当然],$ 2
    应用[当然] =应用[当然],$ 3}
结束 {
    #现在每个过程中发现
    对(在头键){
        #构造打印字符串
        打印头[关键]
        打印理论[关键]
        打印应用[关键]
        打印
}

反正是有得到这些字符串STR *和* SMTGH一骑为了得到这样的输出:

卡尔1,10:00,14:00
一个应用,halfhour,1小时
理论,地理,programmation卡尔2,10:00,14:00
一个应用,没什么,没什么
理论,历史,哲学大卫1,10:00,14:00
一个应用,1小时,2小时
理论上讲,没什么,没什么大卫2,10:00,14:00
一个应用,2小时,2小时
理论,数学,音乐


解决方案

GNU AWK

 的awk -F:-v OFS =
  / ^ AT / {
    拆分($ 0楼)
    时间= F [2]
    当然= F [3],F [4]
    次[当然] =倍[当然] OFS时间
  }
  $ 2 ==oftheory{日[当然] =第[当然] OFS $(NF-1)}
  $ 2 ==ofapplicaton{AP [当然] = AP [当然] OFS $(NF-1)}
  结束 {
    PROCINFO [sorted_in] =@ind_str_asc
    为(在C时代){
      printf的%s%S \\ n,C,倍[C]
      printf的应用程序%S \\ n,美联社[C]
      printf的学说%S \\ N个[C]
      打印
    }
  }
'文件

 卡尔1,10:00,14:00
应用程序,1小时,2小时
理论上讲,没什么,没什么卡尔2,10:00,14:00
应用程序,2小时,2小时
理论,数学,音乐大卫1,10:00,14:00
应用程序,halfhour,1小时
理论,地理,programmation大卫2,10:00,14:00
应用程序,没什么,没什么
理论,历史,哲学

I have a file that contains many logs :

Ps: the question is inspired from a previous question here. but slightly improved.

at 10:00 carl 1 STR0 STR1 STR2 STR3 <STR4 STR5> [STR6 STR7] STR8:
academy/course1:oftheory:SMTGHO:nothing:
academy/course1:ofapplicaton:SMTGHP:onehour:

at 10:00 carl 2 STR0 STR1 STR2 STR3 <STR4 STR78> [STR6 STR111] STR8:
academy/course2:oftheory:SMTGHM:math:
academy/course2:ofapplicaton:SMTGHN:twohour:

at 10:00 david 1 STR0 STR1 STR2 STR3 <STR4 STR758> [STR6 STR155] STR8:
academy/course3:oftheory:SMTGHK:geo:
academy/course3:ofapplicaton:SMTGHL:halfhour:

at 10:00 david 2 STR0 STR1 STR2 STR3 <STR4 STR87> [STR6 STR74] STR8:
academy/course4:oftheory:SMTGH:SMTGHI:history:
academy/course4:ofapplicaton:SMTGHJ:nothing:

at 14:00 carl 1 STR0 STR1 STR2 STR3 <STR4 STR11> [STR6 STR784] STR8:
academy/course5:oftheory:SMTGHG:nothing:
academy/course5:ofapplicaton:SMTGHH:twohours:

at 14:00 carl 2 STR0 STR1 STR2 STR3 <STR4 STR86> [STR6 STR85] STR8:
academy/course6:oftheory:SMTGHE:music:
academy/course6:ofapplicaton:SMTGHF:twohours:

at 14:00 david 1 STR0 STR1 STR2 STR3 <STR4 STR96> [STR6 STR01] STR8:
academy/course7:oftheory:SMTGHC:programmation:
academy/course7:ofapplicaton:SMTGHD:onehours:

at 14:00 david 2 STR0 STR1 STR2 STR3 <STR4 STR335> [STR6 STR66] STR8:
academy/course8:oftheory:SMTGHA:philosophy:
academy/course8:ofapplicaton:SMTGHB:nothing:

I have tried to apply the code below but in vain :

BEGIN {
    # set records separated by empty lines
    RS=""
    # set fields separated by newline, each record has 3 fields
    FS="\n"
}
{
    # remove undesired parts of every first line of a record
    sub("at ", "", $1)
    # now store the rest in time and course
    time=$1
    course=$1
    # remove time from string to extract the course title
    sub("^[^ ]* ", "", course)
    # remove course title to retrieve time from string
    sub(course, "", time)
    # get theory info from second line per record
    sub("course:theory:", "", $2)
    # get application info from third line
    sub("course:applicaton:", "", $3)
    # if new course
    if (! (course in header)) {
        # save header information (first words of each line in output)
        header[course] = course
        theory[course] = "theory"
        app[course] = "application"
    }
    # append the relevant info to the output strings
    header[course] = header[course] "," time
    theory[course] = theory[course] "," $2
    app[course] = app[course] "," $3

}
END {
    # now for each course found
    for (key in header) {
        # print the strings constructed
        print header[key]
        print theory[key]
        print app[key]
        print ""
}

Is there anyway to get a ride of these strings STR* and SMTGH* in order to get this output:

carl 1,10:00,14:00
applicaton,halfhour,onehours
theory,geo,programmation

carl 2,10:00,14:00
applicaton,nothing,nothing
theory,history,philosophy

david 1,10:00,14:00
applicaton,onehour,twohours
theory,nothing,nothing

david 2,10:00,14:00
applicaton,twohour,twohours
theory,math,music

解决方案

GNU awk

awk -F: -v OFS=, '
  /^at/ {
    split($0, f, " ")
    time = f[2]
    course = f[3] " " f[4]
    times[course] = times[course] OFS time
  }
  $2 == "oftheory"     {th[course] = th[course] OFS $(NF-1)}
  $2 == "ofapplicaton" {ap[course] = ap[course] OFS $(NF-1)}
  END {
    PROCINFO["sorted_in"] = "@ind_str_asc"
    for (c in times) {
      printf "%s%s\n", c, times[c]
      printf "application%s\n", ap[c]
      printf "theory%s\n", th[c]
      print ""
    }
  }
' file

carl 1,10:00,14:00
application,onehour,twohours
theory,nothing,nothing

carl 2,10:00,14:00
application,twohour,twohours
theory,math,music

david 1,10:00,14:00
application,halfhour,onehours
theory,geo,programmation

david 2,10:00,14:00
application,nothing,nothing
theory,history,philosophy

这篇关于格式和过滤器文件到CSV表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆