在AWK / perl的分割文件,由线和模式的数 [英] Split file by number of lines and pattern in awk/perl
问题描述
我需要根据线的近似数(在本例中例如〜4,但数千在现实)的文件分割成块,而每个文件都有启动与也发生各块内的许多倍的图案。
I need to split a file into chunks based on approximate number of lines (e.g. ~4 in the example, but thousands in reality), whilst each file has to start with a pattern that also occurs many times within each chunk.
一个块需要先从开始
而不是开始
结束,是> 3系长
A block needs to start with START
and not end with START
and be >3 lines long
输入文件:
START
LINE
LINE
START
LINE
LINE
START
LINE
LINE
LINE
START
LINE
START
LINE
所需的输出文件:
Desired output files:
文件1
START
LINE
LINE
START
LINE
LINE
文件2
START
LINE
LINE
LINE
文件3
START
LINE
START
LINE
用下面的code的问题是,的第2次发生/ ^ START /
被包括在文件1的端部,当它应该在文件2.开始我不能工作的文件如何得到输出时,下一步的纪录是 / ^ START /
。还有,我可以使用没有尽头的模式。
The problem with the following code is that the 2nd occurrence of /^START/
is included at the end of file 1, when it should be at the start of file 2. I can't work out how get the file to output when the next record is /^START/
. There is no end pattern that I can use.
awk '/^START/{f=1} f{ print $0 > "file_"n ; c++} c>3 && /^START/ { n++; c=1; close("file_"n) }' c=1 n=1 file
这是AWK或Perl的解决办法是多AP preciated!
An awk or perl solution would be much appreciated!
推荐答案
这会产生你想要的输出:
This produces the output that you want:
awk -v out=1 'NR>1 && ++i>3 && /^START/ {++out; i=0} {print > "file" out}' file
当所有的条件都满足,增量退出
,这是输出文件名的一部分。
When all of the conditions are satisfied, increment out
, which is part of the output filename.
输出:
$ cat file1
0 START
1 LINE
2 LINE
3 START
4 LINE
5 LINE
$ cat file2
6 START
7 LINE
8 LINE
9 LINE
$ cat file3
10 START
11 LINE
12 START
13 LINE
这篇关于在AWK / perl的分割文件,由线和模式的数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!