如果一列中的文本与前一个中的文本不匹配,则将一个大文件拆分为多个 [英] Splitting one large file into many if the text in a column doesn't match the text in the one before it
本文介绍了如果一列中的文本与前一个中的文本不匹配,则将一个大文件拆分为多个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我搜索了一会儿,找不到对此的回应.我有以下格式的标准tsv文件:
I searched for awhile and couldn't find a response to this. I have a standard tsv file with the following format:
1 100 101 350 A
1 101 102 300 A
1 102 103 180 A
1 800 801 60 B
1 801 802 70 B
1 802 803 82 B
1 975 976 105 C
1 976 977 108 C
等这种情况持续了几百万行,并且第5列(A,B,C)中有1000个不同的区域.就行数而言,区域的大小均不同.我想遍历文件,然后将每个区域分割成自己的文件.
etc. This goes on for a few million lines and there are 1000 different regions in column 5 (A,B,C). The regions are all different sizes in terms of number of lines. I would like to iterate over the file and split each region into its own file.
FileA.txt
FileA.txt
1 100 101 350 A
1 101 102 300 A
1 102 103 180 A
FileB.txt
FileB.txt
1 800 801 60 B
1 801 802 70 B
1 802 803 82 B
FileC.txt
FileC.txt
1 975 976 105 C
1 976 977 108 C
推荐答案
使用awk
awk '{out = "File" $NF ".txt"; print >> out; close(out)}' file
效率更高,每行之后不关闭目标文件:
More efficient, not closing the destination file after every line:
awk '
$NF != dest {if (out) close(out); dest = $NF; out = "File" dest ".txt"}
{print >> out}
' file
这篇关于如果一列中的文本与前一个中的文本不匹配,则将一个大文件拆分为多个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文