sed 食谱:如何在可以在一行或两行的两个模式之间进行操作? [英] sed recipe: how to do stuff between two patterns that can be either on one line or on two lines?
问题描述
假设我们只想在一些模式之间做一些替换,为了清楚起见,让它们成为 和
... (好吧,好吧,它们是
start
和 end
!...天哪!)
Let's say we want to do some substitutions only between some patterns, let them be <a>
and </a>
for clarity... (all right, all right, they're start
and end
!.. Jeez!)
所以我知道如果 start
和 end
总是出现在同一行怎么办:只要设计一个合适的正则表达式.
So I know what to do if start
and end
always occur on the same line: just design a proper regex.
如果他们保证在不同的行上,我也知道该怎么做,我不关心包含 end
的行中的任何内容,我也可以应用所有命令在包含 start
before start
的行中:只需将地址范围指定为 /start/,/end/
.
I also know what to do if they're guaranteed to be on different lines and I don't care about anything in the line containing end
and I'm also OK with applying all the commands in the line containing start
before start
: just specify the address range as /start/,/end/
.
然而,这听起来不是很有用.如果我需要做一个更聪明的工作,例如,在 {...}
块中引入更改怎么办?
This, however, doesn't sound very useful. What if I need to do a smarter job, for instance, introduce changes inside a {...}
block?
我能想到的一件事是在处理之前破坏 {
和 }
上的输入,然后将其重新组合在一起:
One thing I can think of is breaking the input on {
and }
before processing and putting it back together afterwards:
sed 's/{\|}/\n/g' input | sed 'main stuff' | sed ':a $!{N;ba}; s/\n\(}\|{\)\n/\1/g'
另一个选项正好相反:
cat input | tr '\n' '#' | sed 'whatever; s/#/\n/g'
这两个都很难看,主要是因为操作不限于单个命令.第二个更糟糕,因为假设原始文本中不存在某些字符或子字符串,则必须使用某些字符或子字符串作为换行符".
Both of these are ugly, mainly because the operations are not confined within a single command. The second one is even worse because one has to use some character or substring as a "newline holder" assuming it isn't present in the original text.
那么问题来了:有没有更好的方法或者可以优化上面提到的方法?根据我在最近的 SO 问题中阅读的内容,这是一项非常常规的任务,因此我想一劳永逸地选择最佳实践.
So the question is: are there better ways or can the above-mentioned ones be optimized? This is quite a regular task from what I read in recent SO questions, so I'd like to choose the best practice once and for all.
附言我最感兴趣的是纯 sed
解决方案:这项工作可以只调用一次 sed
而不是别的吗?请不要使用 awk
、Perl
等:这更像是一个理论问题,而不是需要尽快完成工作"的问题.
P.S. I'm mostly interested in pure sed
solutions: can the job be do with one invocation of sed
and nothing else? Please no awk
, Perl
, etc.: this is more of a theoretical question, not a "need the job done asap" one.
推荐答案
这可能对你有用:
# create multiline test data
cat <<\! >/tmp/a
> this
> this { this needs
> changing to
> that } that
> that
> !
sed '/{/!b;:a;/}/!{$q;N;ba};h;s/[^{]*{//;s/}.*//;s/this\|that/\U&/g;x;G;s/{[^}]*}\([^\n]*\)\n\(.*\)/{\2}\1/' /tmp/a
this
this { THIS needs
changing to
THAT } that
that
# convert multiline test data to a single line
tr '\n' ' ' </tmp/a >/tmp/b
sed '/{/!b;:a;/}/!{$q;N;ba};h;s/[^{]*{//;s/}.*//;s/this\|that/\U&/g;x;G;s/{[^}]*}\([^\n]*\)\n\(.*\)/{\2}\1/' /tmp/b
this this { THIS needs changing to THAT } that that
说明:
- 将数据读入模式空间 (PS).
/{/!b;:a;/}/!{$q;N;ba}
- 将数据复制到保持空间 (HS) 中.
h
- 从字符串的前后剥离非数据.
s/[^{]*{//;s/}.*//
- 转换数据,例如
s/this\|that/\U&/g
- 交换到 HS 并附加转换后的数据.
x;G
- 用转换后的数据替换旧数据.
s/{[^}]*}\([^\n]*\)\n\(.*\)/{\2}\1/代码>
一个更复杂的答案,我认为它可以满足每行一个以上的块.
A more complicated answer which I think caters for more than one block per line.
# slurp file into pattern space (PS)
:a
$! {
N
ba
}
# check for presence of \v if so quit with exit value 1
/\v/q1
# replace original newlines with \v's
y/\n/\v/
# append a newline to PS as a delimiter
G
# copy PS to hold space (HS)
h
# starting from right to left delete everything but blocks
:b
s/\(.*\)\({.*}\).*\n/\1\n\2/
tb
# delete any non-block details form the start of the file
s/.*\n//
# PS contains only block details
# do any block processing here e.g. uppercase this and that
s/th\(is\|at\)/\U&/g
# append ps to hs
H
# swap to HS
x
# replace each original block with its processed one from right to left
:c
s/\(.*\){.*}\(.*\)\n\n\(.*\)\({.*}\)/\1\n\n\4\2\3/
tc
# delete newlines
s/\n//g
# restore original newlines
y/\v/\n/
# done!
注意这使用 GNU 特定选项,但可以调整以使用通用 sed.
N.B. This uses GNU specific options but could be tweaked to work with generic sed's.
这篇关于sed 食谱:如何在可以在一行或两行的两个模式之间进行操作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!