AWK:第n次出现分隔符时分割文件,第一个分割文件错误 [英] Awk: Splitting file on nth occurence of delimiter, wrong first split file
问题描述
我想在每n次出现>"时分割一个文本文件,如下面粘贴的文本文件(很抱歉,长度不足).例如,每第二次出现>",但我需要能够更改该数字.
I want to split a text file like the one pasted below (sorry for the length), on every n occurence of ">". For example, every 2nd occurrence of ">", but I need to able able to change that number.
test_split.txt:
test_split.txt:
>eeefkdfn
a
a
a
>c 4ufjdhf
b
b
b
b
>
c
c
> c
d
d
d
d
d
>3
>cr
>c3
e
e
e
e
e
> 5
f
f
f
f
>cr
g
g
g
g
> cr dkjfddf
h
h
h
h
所以我想要这些输出文件(仅显示前两个):
So I want to have output files this these (only showing the first two):
file_1.txt:
file_1.txt:
>eeefkdfn
a
a
a
>c 4ufjdhf
b
b
b
b
file_2.txt:
file_2.txt:
>
c
c
> c
d
d
d
d
d
等
问题:
我一直在尝试使用以下awk命令达到该结果:
I have been trying to achieve that result using this awk command:
awk '/^>/ {n++} { file = sprintf("file_%s.txt", int(n/2)); print >> file; }' < test_split.txt
除了第一个文件,它只包含一个出现的>"(而不是两个),而不是期望的结果,我得到的是正确的输出(拆分)文件,如下所示:
And instead of the desired result, I am getting correct output (split) files, except for the first one, which only contains one occurence of ">" (instead of two), like this:
cat test_0.txt
cat test_0.txt
>eeefkdfn
a
a
a
cat test_1.txt
cat test_1.txt
>chr1 4ufjdhf
b
b
b
b
>
c
c
有人知道为什么吗?谢谢!
Any idea why that is? Thank you!
推荐答案
这似乎更简单:
awk 'BEGIN{i=1}/^>/{cont++}cont==3{i++;cont=1}{print > "file_"i".txt"} file
将为您提供预期的结果:
Will gives you the expected result:
$ cat file_1.txt
>eeefkdfn
a
a
a
>c 4ufjdhf
b
b
b
b
$ cat file_2.txt
>
c
c
> c
d
d
d
d
d
说明
BEGIN {i = 1}
:文件计数器初始化.
BEGIN{i=1}
: File counter initialization.
/^>/{cont ++}
:计算找到的每个>
.
cont == 3 {i ++; cont = 1}
:增加文件计数器并在第三个出现的>
char的第三个出现时初始化cont var再次.
cont==3{i++;cont=1}
: To increase the file counter and initialize the cont var every third appearance of the >
char which becomes first again.
{print>"file_" i".txt"}
:将输出定向到期望的文件.
{print > "file_"i".txt"}
: Direct the output to the expected file.
这篇关于AWK:第n次出现分隔符时分割文件,第一个分割文件错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!