在每个空白行上拆分大文本文件 [英] Splitting large text file on every blank line

查看:103
本文介绍了在每个空白行上拆分大文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

将较大的文本文件拆分为多个较小的文件时,我有些麻烦.我的文本文件的语法如下:

I'm having a bit trouble of splitting a large text file into multiple smaller ones. Syntax of my text file is the following:

dasdas #42319 blaablaa 50 50
content content
more content
content conclusion

asdasd #92012 blaablaa 30 70
content again
more of it
content conclusion

asdasd #299 yadayada 60 40
content
content
contend done
...and so on

文件中的典型信息表有10至40行.

A typical information table in my file has anywhere between 10-40 rows.

我希望将此文件拆分为n个较小的文件,其中n是内容表的数量.

I would like this file to be split in n smaller files, where n is the amount of content tables.

那是

dasdas #42319 blaablaa 50 50
content content
more content
content conclusion

将是其自己的单独文件(whateverN.txt)

would be its own separate file, (whateverN.txt)

asdasd #92012 blaablaa 30 70
content again
more of it
content conclusion

还是一个单独的文件whateverN+1.txt,依此类推.

again a separate file whateverN+1.txt and so forth.

awkPerl似乎是很不错的工具,但在语法使用前从未使用过它们.

It seems like awk or Perl are nifty tools for this, but having never used them before the syntax is kinda baffling.

我发现这两个问题几乎与我的问题相对应,但是未能修改语法以适合我的需求:

I found these two questions that are almost correspondent to my problem, but failed to modify the syntax to fit my needs:

将文本文件拆分为多个文件& 如何将文本文件拆分为多个文本文件?(在Unix和Linux上)

Split text file into multiple files & How can I split a text file into multiple text files? (on Unix & Linux)

应该如何修改命令行输入,以解决我的问题?

How should one modify the command line inputs, so that it solves my problem?

推荐答案

RS设置为null会告诉awk使用一个或多个空行作为记录分隔符.然后,您可以简单地使用NR设置与每个新记录相对应的文件名:

Setting RS to null tells awk to use one or more blank lines as the record separator. Then you can simply use NR to set the name of the file corresponding to each new record:

 awk -v RS= '{print > ("whatever-" NR ".txt")}' file.txt

RS: 这是awk的输入记录分隔符.它的默认值是一个包含单个换行符的字符串,这意味着输入记录由一行文本组成. 它也可以是空字符串(在这种情况下,记录由空白行分隔开)或regexp(在这种情况下,记录由输入文本中的regexp匹配分隔).

RS: This is awk's input record separator. Its default value is a string containing a single newline character, which means that an input record consists of a single line of text. It can also be the null string, in which case records are separated by runs of blank lines, or a regexp, in which case records are separated by matches of the regexp in the input text.

$ cat file.txt
dasdas #42319 blaablaa 50 50
content content
more content
content conclusion

asdasd #92012 blaablaa 30 70
content again
more of it
content conclusion

asdasd #299 yadayada 60 40
content
content
contend done

$ awk -v RS= '{print > ("whatever-" NR ".txt")}' file.txt

$ ls whatever-*.txt
whatever-1.txt  whatever-2.txt  whatever-3.txt

$ cat whatever-1.txt 
dasdas #42319 blaablaa 50 50
content content
more content
content conclusion

$ cat whatever-2.txt 
asdasd #92012 blaablaa 30 70
content again
more of it
content conclusion

$ cat whatever-3.txt 
asdasd #299 yadayada 60 40
content
content
contend done
$ 

这篇关于在每个空白行上拆分大文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆