如果一列中的文本与前一个中的文本不匹配,则将一个大文件拆分为多个 [英] Splitting one large file into many if the text in a column doesn't match the text in the one before it

查看:79
本文介绍了如果一列中的文本与前一个中的文本不匹配,则将一个大文件拆分为多个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我搜索了一会儿,找不到对此的回应.我有以下格式的标准tsv文件:

I searched for awhile and couldn't find a response to this. I have a standard tsv file with the following format:

1    100    101    350    A
1    101    102    300    A
1    102    103    180    A
1    800    801    60     B
1    801    802    70     B
1    802    803    82     B
1    975    976    105    C
1    976    977    108    C

等这种情况持续了几百万行,并且第5列(A,B,C)中有1000个不同的区域.就行数而言,区域的大小均不同.我想遍历文件,然后将每个区域分割成自己的文件.

etc. This goes on for a few million lines and there are 1000 different regions in column 5 (A,B,C). The regions are all different sizes in terms of number of lines. I would like to iterate over the file and split each region into its own file.

FileA.txt

FileA.txt

1    100    101    350    A
1    101    102    300    A
1    102    103    180    A

FileB.txt

FileB.txt

1    800    801    60     B
1    801    802    70     B
1    802    803    82     B

FileC.txt

FileC.txt

1    975    976    105    C
1    976    977    108    C

推荐答案

使用awk

awk '{out = "File" $NF ".txt"; print >> out; close(out)}' file

效率更高,每行之后不关闭目标文件:

More efficient, not closing the destination file after every line:

awk '
    $NF != dest {if (out) close(out); dest = $NF; out = "File" dest ".txt"} 
    {print >> out}
' file

这篇关于如果一列中的文本与前一个中的文本不匹配,则将一个大文件拆分为多个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆