基于内容的分裂在Linux文件 [英] Splitting a file in linux based on content

查看：110 发布时间：2016/7/28 14:51:14 linux file bash sed awk

本文介绍了基于内容的分裂在Linux文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有400MB左右的电子邮件转储。我想这个分裂成.txt文件，每个文件由一个邮件。每封电子邮件开始使用标准的HTML头指定的doctype。

I have an email dump of around 400mb. I want to split this into .txt files, consisting of one mail in each file. Every e-mail starts with the standard HTML header specifying the doctype.

这意味着我将根据上面说的头分裂我的文件。我该如何去了解它在Linux？

This means I will have to split my files based on the above said header. How do I go about it in linux?

推荐答案

如果你有一个 mail.txt

$ cat mail.txt
<html>
    mail A
</html>

<html>
    mail B
</html>

<html>
    mail C
</html>

运行则csplit 通过拆分＆LT; HTML和GT;

$ csplit mail.txt '/^<html>$/' '{*}'

 - mail.txt    => input file
 - /^<html>$/  => pattern match every `<html>` line
 - {*}         => repeat the previous pattern as many times as possible

检查输出

$ ls
mail.txt  xx00  xx01  xx02  xx03

如果你想要做它在 AWK

$ awk '/<html>/{filename=NR".txt"}; {print >filename}' mail.txt
$ ls
1.txt  5.txt  9.txt  mail.txt

这篇关于基于内容的分裂在Linux文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

基于内容的分裂在Linux文件 [英] Splitting a file in linux based on content

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

基于内容的分裂在Linux文件 [英] Splitting a file in linux based on content

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭