awk脚本,用于替换不同文件中同一行中多次出现的字符串模式,并使用与该字符串匹配的数字 [英] awk script for replacing multiple occurances of string pattern in the same line in different files with number matching the string

查看:59
本文介绍了awk脚本,用于替换不同文件中同一行中多次出现的字符串模式,并使用与该字符串匹配的数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一个awk脚本来搜索<>内的任何字符串,如果它找到了它之前未找到的字符串,则应将其替换为索引计数器的当前值(开头为0)并递增柜台.如果在<>中找到字符串它已经知道,它应该查找字符串的索引并将其替换为索引.这应该在多个文件中完成,这意味着仅在程序启动时,在多个文件中搜索模式时计数器不会重置例如:file_a.txt:

I need a awk script that searches for any string inside <>, if it finds one that it hasn't found before it should replace it with the current value of the index counter (0 at the beginning) and increment the counter. If it finds a string inside <> that it already knows, it should look up the index of the string and replace it with the index. This should be done across multiple files, meaning the counter does not reset when multiple files are searched for the patterns, only at program startup For example: file_a.txt:

123abc<abc>xyz
efg
<b>ah
a<c>, <abc>
<c>b
(<abc>, <b>)

file_b.txt:

file_b.txt:

xyz(<c>, <b>)
xyz<b>xy<abc>z

应该成为

file_a_new.txt:

file_a_new.txt:

123abc<0>xyz
efg
<1>ah
a<2>, <0>
<2>b
(<0>, <1>)

file_b_new.txt:

file_b_new.txt:

xyz(<2>, <1>)
xyz<1>xy<0>z

我到目前为止所得到的:

What I got so far:

awk 'match($0, /<[^>]+>/) {
   k = substr($0, RSTART, RLENGTH)
   if (!(k in freq))
      freq[k] = n++
   $0 = substr($0, 1, RSTART-1) freq[k] substr($0, RSTART+RLENGTH)
}
{
   print $0 > (FILENAME ".tmp")
}' files

但这只能检测到一个<>每行图案,但可以有多个<>每行的图案.那么我应该如何更改代码?

But this can only detect one <> pattern per line, but there can be multiple <> patterns per line. So how should I change the code?

这些文件不应被编辑,而应创建新文件

The files should not be editet, instead new files should be created

推荐答案

使用 gnu-awk ,使用 RS 作为< key>这样更容易.字符串:

Using gnu-awk it is easier this way using RS as <key> string:

awk -v RS='<[^>]+>' '{ ORS="" }  # init ORS to ""
RT {                                        # when RT is set
   if (!(RT in freq))                       # if RT is not in freq array
      freq[RT] = n++                        # save n in freq & increment n
   ORS="<" freq[RT] ">"                     # set ORS to < + n + >
}
{
   print $0 > ("/tmp/" FILENAME)
}' file_{a,b}.txt

这篇关于awk脚本,用于替换不同文件中同一行中多次出现的字符串模式,并使用与该字符串匹配的数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆