在每个空白行上拆分大文本文件 [英] Splitting large text file on every blank line
问题描述
将较大的文本文件拆分为多个较小的文件时,我有些麻烦.我的文本文件的语法如下:
I'm having a bit trouble of splitting a large text file into multiple smaller ones. Syntax of my text file is the following:
dasdas #42319 blaablaa 50 50
content content
more content
content conclusion
asdasd #92012 blaablaa 30 70
content again
more of it
content conclusion
asdasd #299 yadayada 60 40
content
content
contend done
...and so on
文件中的典型信息表有10至40行.
A typical information table in my file has anywhere between 10-40 rows.
我希望将此文件拆分为n个较小的文件,其中n是内容表的数量.
I would like this file to be split in n smaller files, where n is the amount of content tables.
那是
dasdas #42319 blaablaa 50 50
content content
more content
content conclusion
将是其自己的单独文件(whateverN.txt
)
would be its own separate file, (whateverN.txt
)
和
asdasd #92012 blaablaa 30 70
content again
more of it
content conclusion
还是一个单独的文件whateverN+1.txt
,依此类推.
again a separate file whateverN+1.txt
and so forth.
awk
或Perl
似乎是很不错的工具,但在语法使用前从未使用过它们.
It seems like awk
or Perl
are nifty tools for this, but having never used them before the syntax is kinda baffling.
我发现这两个问题几乎与我的问题相对应,但是未能修改语法以适合我的需求:
I found these two questions that are almost correspondent to my problem, but failed to modify the syntax to fit my needs:
将文本文件拆分为多个文件& 如何将文本文件拆分为多个文本文件?(在Unix和Linux上)
Split text file into multiple files & How can I split a text file into multiple text files? (on Unix & Linux)
应该如何修改命令行输入,以解决我的问题?
How should one modify the command line inputs, so that it solves my problem?
推荐答案
将RS
设置为null会告诉awk使用一个或多个空行作为记录分隔符.然后,您可以简单地使用NR
设置与每个新记录相对应的文件名:
Setting RS
to null tells awk to use one or more blank lines as the record separator. Then you can simply use NR
to set the name of the file corresponding to each new record:
awk -v RS= '{print > ("whatever-" NR ".txt")}' file.txt
RS: 这是awk的输入记录分隔符.它的默认值是一个包含单个换行符的字符串,这意味着输入记录由一行文本组成. 它也可以是空字符串(在这种情况下,记录由空白行分隔开)或regexp(在这种情况下,记录由输入文本中的regexp匹配分隔).
RS: This is awk's input record separator. Its default value is a string containing a single newline character, which means that an input record consists of a single line of text. It can also be the null string, in which case records are separated by runs of blank lines, or a regexp, in which case records are separated by matches of the regexp in the input text.
$ cat file.txt
dasdas #42319 blaablaa 50 50
content content
more content
content conclusion
asdasd #92012 blaablaa 30 70
content again
more of it
content conclusion
asdasd #299 yadayada 60 40
content
content
contend done
$ awk -v RS= '{print > ("whatever-" NR ".txt")}' file.txt
$ ls whatever-*.txt
whatever-1.txt whatever-2.txt whatever-3.txt
$ cat whatever-1.txt
dasdas #42319 blaablaa 50 50
content content
more content
content conclusion
$ cat whatever-2.txt
asdasd #92012 blaablaa 30 70
content again
more of it
content conclusion
$ cat whatever-3.txt
asdasd #299 yadayada 60 40
content
content
contend done
$
这篇关于在每个空白行上拆分大文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!