每节对单词排序 [英] sort words per section

查看:30
本文介绍了每节对单词排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个文本文件,需要按部分进行排序.

I have this text file that I need to sort per section.

#cat raw_file.txt

== other info ==
===instructions===
===english words===
this
is
only
test


=== missing words ===

==== include words ====
some
more
words

==== customer name ====
ram
sham
amar
akbar
anthony

==== cities ====
mumbai
delhi
pune


=== prefix ===

the
a
an

如果我按原样"对其进行排序,然后以2个等号开头,然后是3个等号,然后是所有单词.如何按部分分别对单词进行排序?

If I sort it "as is" then it starts with 2 equal signs followed by 3 equal signs and then all the words. How do I sort the words per section separately?

# sort raw_file.txt

== other info ==
=== missing words ===
=== prefix ===
==== cities ====
==== customer name ====
==== include words ====
===english words===
===instructions===
a
akbar
amar
an
anthony
delhi
is
more
mumbai
only
pune
ram
sham
some
test
the
this
words

如果重要的话,这是mediawiki格式.我正在对每个部分进行排序,这需要很多时间.

This is mediawiki format if that matters. I am sorting each and every section and that is taking a lot of time.

#cat expected_output.txt

== other info ==
===instructions===
===english words===
is
only
test
this

=== missing words ===

==== include words ====
more
some
words

==== customer name ====
akbar
amar
anthony
ram
sham

==== cities ====
delhi
mumbai
pune

=== prefix ===
a
an
the

推荐答案

如果您不担心保留空白行,可以使用:

If you're not worried about keeping the blank lines you could use:

awk '/=/ {c++} {print c+1, $0}' file.txt | sort -n | cut -d' ' -f2- | sed '/^$/d'
>== other info ==
>===instructions===
>===english words===
>is
>only
>test
>this
>=== missing words ===
>==== include words ====
>more
>some
>words
>==== customer name ====
>akbar
>amar
>anthony
>ram
>sham
>==== cities ====
>delhi
>mumbai
>pune
>=== prefix ===
>a
>an
>the

此方法的工作原理是在每行后面添加一个索引号,并且每当该行包含'='时将索引增加一个,然后先根据索引号进行排序,然后再根据实际字词进行排序,然后删除索引并删除空白行(最终排在排序后每个节"的顶部).

This approach works by appending an index number to every line and incrementing the index by one every time the line contains an '=', then sorting based on the index number first, then the actual word second, then removing the index and removing blank lines (which end up at the top of each 'section' after the sort).

我刚刚看到@Bing Wang的评论-这基本上就是他建议您做的事情

I just saw @Bing Wang's comment - this is basically what he suggested you do

这篇关于每节对单词排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆