用字符X在两个字符串之间更改文件中的字符串 [英] change string in file between two strings with character X
问题描述
我想用相同数量的X替换标签之间的值.例如
I wanna replace value between a tag by equal number of X. For example
1.
<Name> Jason </Name>
to
<Name> XXXXX </Name>
2. (没有空格)
2. (see no space)
<Name>Jim</Name>
to
<Name>XXX</Name>
3.
<Name Jason />
to
<Name XXXXX />`
4.
<Name Jas />
to
<Name XXX />
开始标记,值和结束标记都可以位于不同的行
starting tag, value and closing tag can all come in different line
5.
<Name>Jim
</Name>
to
<Name>XXX
</Name>
6.
<Name>
Jim
</Name>
to
<Name>
XXX
</Name>
7.
<Name
Jim
/>
to
<Name
XXX
/>
8.
<Name> Jason </Name> <Name> Ignacio </Name>
to
<Name> XXXXX </Name> <Name> XXXXXX </Name>
9.
<Name> Jason Ignacio </Name>
to
<Name> XXXXX XXXXXXX </Name>
or
<Name> XXXXXXXXXXXXX </Name>
都很好
我尝试过,但是没有用
file=mylog.log
search_str="<Name>"
end_str="</Name>"
sed -i -E ':a; s/('"$search_str"'X*)[^X'"$end_str"']/\1X/; ta' "$file"
请让我知道如何在bash脚本中执行此操作....
Please let me know how to do this in bash script....
更新:
我也尝试过此方法,但没有处理6例和7例.案例1至案例5有效.
I tried this also, but didn't worked for 6 and 7 cases. case 1 to 5 worked.
sed -i -E '/<Name>/{:a; /<\/Name>/bb; n; ba; :b; s/(<Name>X*)[^X\<]/\1X/; tb; }' "$file"
sed -i -E '/<Name[[:space:]]/{:a; /\/>/bb; n; ba; :b; s/(<Name[[:space:]]X*)[^X\/]/\1X/; tb; }' "$file"
推荐答案
临时解决方案
这扩展了下面的初始报价"并处理情况1、2、5、6、8、9.它不处理存在一个或多个完整<Name>…</Name>
条目以及起始<Name>
的情况.在同一行上没有匹配的</Name>
.坦白说,我什至不知道如何开始解决这种情况.
Provisional solution
This extends the 'initial offering' below and handles cases 1, 2, 5, 6, 8, 9. It does not handle the case where there is one or more complete <Name>…</Name>
entries and also a starting <Name>
without the matching </Name>
on the same line. Frankly, I'm not even sure how to start tackling that scenario.
未处理的情况3、4、7不是有效的XML,我也不确信它们也是有效的HTML(或XHTML).我相信可以通过与此处显示的完整<Name>…</Name>
版本类似的机制(但更简单)来处理它们.我将其留给读者作为练习(请注意字符类中的<
-它需要成为/
).
The unhandled cases 3, 4, 7 are not valid XML — I'm not convinced they're valid HTML (or XHTML) either. I believe they can be handled by a similar (but simpler) mechanism to the one shown here for the full <Name>…</Name>
version. I'm leaving that as an exercise for the reader (beware the <
in the character class — it would need to become a /
).
/<Name>/! b
/<Name>.*<\/Name>/{
: l1
s/\(<Name>[[:space:]]*\(X[X[[:space:]]*\)\{0,1\}\)[^X<[:space:]]\(.*[[:space:]]*<\/Name>\)/\1X\3/
t l1
b
}
/<Name>/,/<\/Name>/{
# Handle up to 4 lines to the end-name tag
/<\/Name>/! N
/<\/Name>/! N
/<\/Name>/! N
/<\/Name>/! N
# s/^/ZZ/; s/$/AA/p
# s/^ZZ//; s/AA$//
: l2
s/\(<Name>[[:space:]]*\(X[X[[:space:]]*\)\{0,1\}\)[^X<[:space:]]\(.*[[:space:]]*<\/Name>\)/\1X\3/
t l2
}
第一行跳过"不包含<Name>
的行的处理(它们将被打印并读取下一行).接下来的6行是初始提供"中的脚本,除了有一个b
可以跳转到处理的结尾.
The first line 'skips' processing of lines not containing <Name>
(they get printed and the next line is read). The next 6 lines are the script from the 'initial offering' except that there's a b
to jump to the end of processing.
新部分是/<Name>/,/<\/Name>/
代码.这将单独查找<Name>
,并最多连接4行,直到在模式空间中包含</Name>
.这两个注释行用于调试-它们使我能够看到被视为一个单元的内容.除了使用标签l2
代替l1
之外,其余部分与最初的产品完全相同-sed
正则表达式已经可以容纳换行符.
The new section is the /<Name>/,/<\/Name>/
code. This looks for <Name>
on its own, and concatenates up to 4 lines until a </Name>
is included in the pattern space. The two comment lines were used for debugging — they allowed me to see what was being treated as a unit. Except for the use of the label l2
in place of l1
, the remainder is exactly the same as in the initial offering — sed
regexes already accommodate newlines.
这是重型sed
脚本,不是我想要使用或维护的脚本.我会选择使用XML解析器的Perl解决方案(因为我比Python更了解Perl),但是Python也会通过使用适当的XML解析器来很好地完成工作.
This is heavy-duty sed
scripting and not what I'd want to use or maintain. I would go with a Perl solution using an XML parser (because I know Perl better than Python), but Python would do the job fine too with an appropriate XML parser.
稍微扩展的数据文件.
<Name> Jason </Name>
<Name>Jim</Name>
<Name> Jason Bourne </Name>
<Name> Elijah </Name> <Name> Dennis </Name>
<Name> Elijah Wood </Name> <Name> Dennis The Menace </Name>
<Name>Elijah Wood</Name> <Name>Dennis The Menace</Name>
<Name> Jason
</Name>
<Name>
Jim</Name>
<Name>
Jim
</Name>
<Name> Jason
Bourne </Name>
<Name>
Jason
Bourne
</Name>
<Name> Elijah </Name>
<Name>
Dennis
</Name>
<Name> Elijah
Wood </Name>
<Name> Dennis
The Menace </Name>
<Name>Elijah
Wood</Name>
<Name>Dennis The
Menace</Name>
<Name> Jason </Name>
to
<Name> XXXXX </Name>
2. (see no space)
<Name>Jim</Name>
to
<Name>XXX</Name>
3.
<!--Name Jason /-->
to
<!--Name XXXXX /-->`
4.
<!--Name Jas /-->
to
<!--Name XXX /-->
starting tag, value and closing tag can all come in different line
5.
<Name>Jim
</Name>
to
<Name>XXX
</Name>
6.
<Name>
Jim
</Name>
to
<Name>
XXX
</Name>
7.
<!--Name
Jim
/-->
to
<!--Name
XXX
/-->
8.
<Name> Jason </Name> <Name> Ignacio </Name>
to
<Name> XXXXX </Name> <Name> XXXXXX </Name>
9.
<Name> Jason Ignacio </Name>
to
<Name> XXXXX XXXXXXX </Name>
or
<Name> XXXXXXXXXXXXX </Name>
没有断言data
文件包含最少的情况;这是重复的.它包括问题的材料,只是将诸如<Name Value />
之类的非正统" XML元素转换为XML注释<!--Name Value /-->
.映射实际上并不是至关重要的.开头部分与<Name>
不匹配(尾巴与</Name>
不匹配),因此无论如何都不会对其进行处理.
No claims are made that the data
file contains a minimal set of cases; it is repetitious. It includes the material from the question, except that the 'unorthodox' XML elements like <Name Value />
are converted into XML comments <!--Name Value /-->
. The mapping actually isn't crucial; the opening part doesn't match <Name>
(and the tail doesn't match </Name>
) so they'd not be processed anyway.
$ sed -f script.sed data
<Name> XXXXX </Name>
<Name>XXX</Name>
<Name> XXXXX XXXXXX </Name>
<Name> XXXXXX </Name> <Name> XXXXXX </Name>
<Name> XXXXXX XXXX </Name> <Name> XXXXXX XXX XXXXXX </Name>
<Name>XXXXXX XXXX</Name> <Name>XXXXXX XXX XXXXXX</Name>
<Name> XXXXX
</Name>
<Name>
XXX</Name>
<Name>
XXX
</Name>
<Name> XXXXX
XXXXXX </Name>
<Name>
XXXXX
XXXXXX
</Name>
<Name> XXXXXX </Name>
<Name>
XXXXXX
</Name>
<Name> XXXXXX
XXXX </Name>
<Name> XXXXXX
XXX XXXXXX </Name>
<Name>XXXXXX
XXXX</Name>
<Name>XXXXXX XXX
XXXXXX</Name>
<Name> XXXXX </Name>
to
<Name> XXXXX </Name>
2. (see no space)
<Name>XXX</Name>
to
<Name>XXX</Name>
3.
<!--Name Jason /-->
to
<!--Name XXXXX /-->`
4.
<!--Name Jas /-->
to
<!--Name XXX /-->
starting tag, value and closing tag can all come in different line
5.
<Name>XXX
</Name>
to
<Name>XXX
</Name>
6.
<Name>
XXX
</Name>
to
<Name>
XXX
</Name>
7.
<!--Name
Jim
/-->
to
<!--Name
XXX
/-->
8.
<Name> XXXXX </Name> <Name> XXXXXXX </Name>
to
<Name> XXXXX </Name> <Name> XXXXXX </Name>
9.
<Name> XXXXX XXXXXXX </Name>
to
<Name> XXXXX XXXXXXX </Name>
or
<Name> XXXXXXXXXXXXX </Name>
$
首次发行
部分答案-但这说明了您面临的问题.处理案例1和在问题2中,加上多词变体,您可以使用以下脚本:
Initial offering
A partial answer — but it illustrates the problems you face. Dealing with cases 1 & 2 in the question, plus the multi-word variations, you can use the script:
/<Name>.*<\/Name>/{
: l1
s/\(<Name>[[:space:]]*\(X[X[[:space:]]*\)\{0,1\}\)[^X<[:space:]]\(.*[[:space:]]*<\/Name>\)/\1X\3/
t l1
}
礼貌地说,这是相当扭曲的.它查找<Name>
,后跟零个或多个空格.后面可以跟\(X[X[[:space:]]*\)\{0,1\}
,这表示X出现零次或一次,后跟一系列X或空格.所有这些在替换中均捕获为\1
.然后有一个不是X
,<
或空格的字符,后跟零个或多个任何字符,零个或多个空格和</Name>
.中间的单个字符被X替换.重复整个替换,直到通过标签: l1
和条件分支t l1
不再匹配为止.所有这些仅在同时包含<Name>
和</Name>
的行上运行.
That is pretty contorted, to be polite about it. It looks for <Name>
followed by zero or more spaces. That can be followed by \(X[X[[:space:]]*\)\{0,1\}
, which means zero or one occurrences of an X followed by a sequence of X's or spaces. All of that is captured as \1
in the replacement. Then there's a single character that isn't an X
, <
or space, followed by zero or more any characters, zero or more spaces, and </Name>
. The single character in the middle is replaced by an X. The whole replacement is repeated until there are no more matches via the label : l1
and the conditional branch t l1
. All that operates only on a line with both <Name>
and </Name>
.
<Name> Jason </Name>
<Name>Jim</Name>
<Name> Jason Bourne </Name>
<Name> Elijah </Name> <Name> Dennis </Name>
<Name> Elijah Wood </Name> <Name> Dennis The Menace </Name>
<Name>Elijah Wood</Name> <Name>Dennis The Menace</Name>
<Name> Jason
</Name>
<Name>
Jim</Name>
<Name> Jason
Bourne </Name>
<Name> Elijah </Name> <Name> Dennis
</Name>
<Name> Elijah
Wood </Name> <Name> Dennis
The Menace </Name>
<Name>Elijah
Wood</Name> <Name>Dennis The
Menace</Name>
输出
$ sed -f script.sed data
<Name> XXXXX </Name>
<Name>XXX</Name>
<Name> XXXXX XXXXXX </Name>
<Name> XXXXXX </Name> <Name> XXXXXX </Name>
<Name> XXXXXX XXXX </Name> <Name> XXXXXX XXX XXXXXX </Name>
<Name>XXXXXX XXXX</Name> <Name>XXXXXX XXX XXXXXX</Name>
<Name> Jason
</Name>
<Name>
Jim</Name>
<Name> Jason
Bourne </Name>
<Name> XXXXXX </Name> <Name> Dennis
</Name>
<Name> Elijah
Wood </Name> <Name> Dennis
The Menace </Name>
<Name>Elijah
Wood</Name> <Name>Dennis The
Menace</Name>
$
请注意直到最后的替换部分.那条线将引起更多的头痛.
Note the replacement part way through the end. That line is going to cause headaches for anything more.
我还没有弄清楚该脚本如何处理各种分割行的情况,除此之外,几乎可以肯定,在抓到</Name>
之前,都需要加入行.然后,它将进行与已显示的处理紧密相关的处理,但需要在匹配的材料中包含换行符.
I've not worked out how the script would handle the various split-line cases, beyond it would almost certainly need to join lines until the </Name>
is caught. It would then do processing closely related to that already shown, but it would need to allow for newlines in the matched material.
这篇关于用字符X在两个字符串之间更改文件中的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!