分离一个文本文件的各个部分用bash脚本 [英] Separating sections of a text file with a bash script

查看:139
本文介绍了分离一个文本文件的各个部分用bash脚本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个列表:

    ### To Read:
    One Hundred Years of Solitude | Gabriel García Márquez
    Moby-Dick | Herman Melville
    Frankenstein | Mary Shelley
    On the Road | Jack Kerouac
    Eyeless in Gaza | Aldous Huxley
    ### Read:
    The Name of the Wind (The Kingkiller Chronicles: Day One) | Patrick Rothfuss | 6-27-2013
    The Wise Man’s Fear (The Kingkiller Chronicles: Day Two) | Patrick Rothfuss | 8-4-2013
    Vampires in the Lemon Grove | Karen Russell | 12-25-2013
    Brave New World | Aldous Huxley | 2-2014

我想使用类似Python的 string.split('|')来分隔各个领域为单独的字符串,但由于这两个部分具有不同的号码领域,我想我需要区别对待。我怎么去之间选择线'###改为:和###阅读:'后'###阅读:和他们的分裂?我应该用awk或sed的?

I'd like to use something like python's string.split(' | ') to separate the various fields into separate strings, but since the two sections have different numbers of fields, I think I need to treat them differently. How do I go about selecting the lines in between '### To Read:' and '### Read:' and after '### Read:' and splitting them? Should I use awk or sed?

推荐答案

您没有告诉我们如何提供最终的输出,但这里是一个awk解决方案的框架。

You are not telling us how to deliver the final output, but here is a skeleton for an Awk solution.

awk -F ' \| ' '/^### To read:/ { s=1; next }
    /^### Read:/ { s=2; next }
    s==1 { print $1 "," $2 ",\"\"" }
    s == 2 { print $1 "," $2 "," $3 }' file

这会简单地从第一款打印一个空的第三个字段。可以很明显的调整动作是任何你喜欢的,或者重写这个在Python如果您更熟悉。

This will simply print an empty third field from the first subsection. You can obviously adapt the actions to be anything you like, or rewrite this in Python if you are more familiar with that.

这篇关于分离一个文本文件的各个部分用bash脚本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆