按列值有效地将一个文件拆分为多个文件 [英] efficiently splitting one file into several files by value of column
问题描述
我有一个非常大的制表符分隔文本文件.对于文件中的一列(称为 k 列),文件中的许多行具有相同的值.我想将此文件分成多个文件,将具有相同 k 值的条目放在同一个文件中.我怎样才能做到这一点?例如:
I have a tab-delimited text file that is very large. Many lines in the file have the same value for one of the columns in the file (call it column k). I want to separate this file into multiple files, putting entries with the same value of k in the same file. How can I do this? For example:
a foo
1 bar
c foo
2 bar
d foo
应该被分割成一个包含条目a foo"和c foo"和d foo"的文件foo"和一个包含条目1 bar"和2 bar"的名为bar"的文件.
should be split into a file "foo" containing the entries "a foo" and "c foo" and "d foo" and a file called "bar" containing the entries "1 bar" and "2 bar".
如何在 shell 脚本或 Python 中执行此操作?
how can I do this in either a shell script or in Python?
谢谢.
推荐答案
我不确定它的效率如何,但快速简便的方法是利用文件重定向的工作方式在 awk
中:
I'm not sure how efficient it is, but the quick and easy way is to take advantage of the way file redirection works in awk
:
awk '{ print >> $5 }' yourfile
这会将每一行(未修改)附加到一个以列 5
命名的文件中.根据需要进行调整.
That will append each line (unmodified) into a file named after column 5
. Adjust as necessary.
这篇关于按列值有效地将一个文件拆分为多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!