在AWK / perl的分割文件,由线和模式的数 [英] Split file by number of lines and pattern in awk/perl

查看:112
本文介绍了在AWK / perl的分割文件,由线和模式的数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要根据线的近似数(在本例中例如〜4,但数千在现实)的文件分割成块,而每个文件都有启动与也发生各块内的许多倍的图案。

I need to split a file into chunks based on approximate number of lines (e.g. ~4 in the example, but thousands in reality), whilst each file has to start with a pattern that also occurs many times within each chunk.

一个块需要先从开始而不是开始结束,是> 3系长

A block needs to start with START and not end with START and be >3 lines long

输入文件:

START
LINE
LINE
START
LINE 
LINE 
START
LINE 
LINE 
LINE 
START
LINE 
START
LINE

所需的输出文件:

Desired output files:

文件1

START
LINE
LINE
START
LINE
LINE

文件2

START
LINE 
LINE 
LINE 

文件3

START
LINE 
START
LINE

用下面的code的问题是,的第2次发生/ ^ START / 被包括在文件1的端部,当它应该在文件2.开始我不能工作的文件如何得到输出时,下一步的纪录是 / ^ START / 。还有,我可以使用没有尽头的模式。

The problem with the following code is that the 2nd occurrence of /^START/ is included at the end of file 1, when it should be at the start of file 2. I can't work out how get the file to output when the next record is /^START/. There is no end pattern that I can use.

awk '/^START/{f=1} f{ print $0 > "file_"n ; c++} c>3 && /^START/ { n++; c=1; close("file_"n) }' c=1 n=1 file

这是AWK或Perl的解决办法是多AP preciated!

An awk or perl solution would be much appreciated!

推荐答案

这会产生你想要的输出:

This produces the output that you want:

awk -v out=1 'NR>1 && ++i>3 && /^START/ {++out; i=0} {print > "file" out}' file

当所有的条件都满足,增量退出,这是输出文件名的一部分。

When all of the conditions are satisfied, increment out, which is part of the output filename.

输出:

$ cat file1
 0 START
 1 LINE
 2 LINE
 3 START
 4 LINE 
 5 LINE 
$ cat file2
 6 START
 7 LINE 
 8 LINE 
 9 LINE 
$ cat file3
10 START
11 LINE 
12 START
13 LINE

这篇关于在AWK / perl的分割文件,由线和模式的数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆