一次处理10行样本数据 [英] process 10 lines of the sample data at a time

查看:49
本文介绍了一次处理10行样本数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想制作一个循环,该循环将占用输入文件的 10行,并将其输出到输出文件.并继续在输出文件中添加行,而不要覆盖它.

I would like to make a loop that will take 10 lines of my input file and output it to an output file. And continue to add lines to the output file not over writing it.

这是示例数据:

FilePath    Filename    Probability ClassifierID    HectorFileType  LibmagicFileType

/mnt/Hector/Data/benign/binary/benign-pete/ 01d0cd964020a1f498c601f9801742c1    19  S040PDFv02  data.pdf    PDF document

/mnt/Hector/Data/benign/binary/benign-pete/ 0299a1771587043b232f760cbedbb5b7    0   S040PDFv02  data.pdf    PDF document

然后我用它来计数每个唯一文件,并显示每个文件的数量:

I then use this to count each unique file and show how many of each file there is with:

cut -f 4 input.txt|sort| uniq -c | awk '{print $2, $1}' | sed 1d

因此,最终我只需要帮助我们完成一个循环,该循环可以运行该bash行并一次将10行数据输出到一个输出文件中

So ultimately I just need help making a loop that can run that line of bash and output 10 lines of data at a time to an output file

推荐答案

如果我正确理解,对于每10行代码,您将尝试:

If I understand correctly, for every block of 10 lines, you are trying to:

  1. 跳过标题,即该块的第一行
  2. 计算字段#4(ClassifierID)出现的次数,并输出该字段和计数.

这是一个可以执行此操作的AWK脚本:

Here is an AWK script which will do it:

FNR % 10 != 1 {
    ++count[$4]
}

FNR % 10 == 0 {
    for (i in count) {
        print i, count[i]
        delete count[i]
    }
}

讨论

  • FNR % 10 != 1块处理每行,但是第1、11、21,...行也就是要跳过的行.此块保留字段$ 4的计数
  • FNR % 10 == 0块打印出该块的摘要并重置(通过删除)计数
  • 我的脚本没有对字段进行排序,因此顺序可能有所不同.
  • 如果要计算整个文件,而不仅仅是10s的块,则将FNR % 10 == 0替换为END.
  • Discussion

    • The FNR % 10 != 1 block processes every line, but lines 1, 11, 21, ... AKA the lines you want to skip. This block keeps a count of field $4
    • The FNR % 10 == 0 block prints out a summary for that block and resets (via delete) the count
    • My script does not sort the fields, so the order might be different.
    • If you want to tally for the whole file, not just block of 10s, then replace FNR % 10 == 0 with END.
    • 这篇关于一次处理10行样本数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆