替换多个文件中的多个字符串 [英] Replacing multiple strings in multiple files
问题描述
我有一个文件,其中包含以下格式的正则表达式和替换文字字符串的列表:
I have a file containing a list of regular expressions and replacement literal strings in the following format :
OLD_REGEXP_1 NEW_STRING_1
OLD_REGEXP_2 NEW_STRING_2
...
我想用多个文件*.txt
中的NEW_STRING_X
替换所有与OLD_REGEXP_X
匹配的字符串.
I want to replace all of the strings that match OLD_REGEXP_X
with NEW_STRING_X
in multiple files *.txt
.
我相信这是一个常见的问题,有人以前应该已经做过类似的事情,但我只是找不到使用bash编写的现有解决方案.
I believe that this is a common question and someone should have already done something similar before, but I just couldn't find an existing solution written in bash.
例如:
Tom Thompson
Billy Bill&Ted
goog1e\.com google.com
https?://www\.google\.com https://google.com
输入:
Tom and Billy are visiting http://www.goog1e.com
预期输出:
Thompson and Bill&Ted are visiting https://google.com
主要挑战是:
- 要替换的字符串是由POSIX扩展正则表达式而不是文字描述的,并且不是POSIX ERE元字符的任何字符(包括
/
)(通常被某些工具用作正则表达式定界符)都必须视为文字. - 替换字符串是文字的,并且可以包含任何文字字符,包括像
&
和\1
这样的字符,通常在替换字符串中用作反向引用元字符,但在这种情况下必须为文字. - 替换必须按照它们在映射文件中出现的顺序发生,因此,如果我们在映射文件中具有该顺序的A-> B和B-> C,并且A出现在要更改的文本文件中,则输出将包含"C"代替"A",而不是"B".
- The strings to be replaced are described by POSIX Extended Regular Expressions, not literal, and any character that is not a POSIX ERE metacharacter, including
/
which is often used as a regexp delimiter by some tools, must be treated as literal. - The replacement strings are literal and can contain any literal character, including chars like
&
and\1
that are often used as backreference metacharacters in replacement strings but must be literal in this case. - Replacements must occur in the order they appear in the mapping file so if we have A->B and B->C in that order in the mapping file and A appears in the text file that is to be changed, then the output will contain "C" in place of "A", not "B".
推荐答案
给出您到目前为止告诉我们的内容,并考虑注释中所说的所有内容以及问题中涉及的内容以及我能想到的所有可能的字符串目前未包含在您的示例中,但是可能会发生(不包括包含空格的字符串-您必须告诉我们如何在mapfile中识别旧的还是新的来处理),听起来这就是您所需要的:
Given what you've told us so far and considering everything said in comments as well as what's in the question and all of the possible strings I can think of that aren't currently included in your example but can occur (excluding strings that contain spaces - you'd have to tell us how to identify old vs new in mapfile to handle that), it sounds like this is what you need:
$ cat mapfile
Tom Thompson
Billy Bill&Ted
goog1e\.com google.com
https?://www\.google\.com https://google.com
$ cat textfile
Tom and Billy are visiting http://www.goog1e.com
awk '
NR==FNR {
old[NR] = $1
gsub(/&/,RS,$2)
new[NR] = $2
next
}
{
for (i=1; i in old; i++) {
gsub(old[i],new[i])
}
gsub(RS,"\\&")
print
}
' mapfile textfile
Thompson and Bill&Ted are visiting https://google.com
以上内容将旧字符串"视为正则表达式,将新字符串"视为不带反向引用的文字字符串,并严格按照输入文件中定义的顺序应用替换.
The above treats the "old string" as a regexp, treats the "new string" as a literal string with no backreferences and applies the replacements strictly in the order defined in your input file.
第一个gsub()将替换字符串中的每个&
转换为一个记录分隔符(由于我们在一条记录内进行操作,因此无法显示),因此第二个gsub()将不会处理&
新字符串,例如反向引用,然后第三个gsub()只是将RS放回&
s.
The first gsub() converts every &
in the replacement string to a Record Separator (which cannot be present since we're operating WITHIN a Record) so that the 2nd gsub() will not treat &
s in the new string like a backreference, and then the 3rd gsub() just puts the RSs back to &
s.
以上内容可在任何UNIX系统上的任何Shell中使用任何awk进行工作.
The above will work using any awk in any shell on any UNIX system.
这篇关于替换多个文件中的多个字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!