用字符X在两个字符串之间更改文件中的字符串 [英] change string in file between two strings with character X

查看:59
本文介绍了用字符X在两个字符串之间更改文件中的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用相同数量的X替换标签之间的值.例如

I wanna replace value between a tag by equal number of X. For example

1.

<Name> Jason </Name>
to
<Name> XXXXX </Name>

2. (没有空格)

2. (see no space)

 <Name>Jim</Name>
 to
 <Name>XXX</Name>

3.

<Name Jason /> 
to 
<Name XXXXX />`

4.

<Name Jas />
to
<Name XXX />

开始标记,值和结束标记都可以位于不同的行

starting tag, value and closing tag can all come in different line

5.

<Name>Jim
</Name>
to
<Name>XXX
</Name>

6.

<Name>
     Jim
       </Name>
to
<Name>
     XXX
       </Name>

7.

  <Name
     Jim
       />
to
  <Name
     XXX
       />

8.

<Name> Jason </Name> <Name> Ignacio </Name>
to
<Name> XXXXX </Name> <Name> XXXXXX </Name>

9.

<Name> Jason Ignacio </Name>
to
<Name> XXXXX XXXXXXX </Name>
or
<Name> XXXXXXXXXXXXX </Name>

都很好

我尝试过,但是没有用

file=mylog.log
search_str="<Name>"
end_str="</Name>"
sed -i -E ':a; s/('"$search_str"'X*)[^X'"$end_str"']/\1X/; ta' "$file"

请让我知道如何在bash脚本中执行此操作....

Please let me know how to do this in bash script....

更新:

我也尝试过此方法,但没有处理6例和7例.案例1至案例5有效.

I tried this also, but didn't worked for 6 and 7 cases. case 1 to 5 worked.

sed -i -E '/<Name>/{:a; /<\/Name>/bb; n; ba; :b; s/(<Name>X*)[^X\<]/\1X/; tb; }' "$file"
sed -i -E '/<Name[[:space:]]/{:a; /\/>/bb; n; ba; :b; s/(<Name[[:space:]]X*)[^X\/]/\1X/; tb; }' "$file"

推荐答案

临时解决方案

这扩展了下面的初始报价"并处理情况1、2、5、6、8、9.它不处理存在一个或多个完整<Name>…</Name>条目以及起始<Name>的情况.在同一行上没有匹配的</Name>.坦白说,我什至不知道如何开始解决这种情况.

Provisional solution

This extends the 'initial offering' below and handles cases 1, 2, 5, 6, 8, 9. It does not handle the case where there is one or more complete <Name>…</Name> entries and also a starting <Name> without the matching </Name> on the same line. Frankly, I'm not even sure how to start tackling that scenario.

未处理的情况3、4、7不是有效的XML,我也不确信它们也是有效的HTML(或XHTML).我相信可以通过与此处显示的完整<Name>…</Name>版本类似的机制(但更简单)来处理它们.我将其留给读者作为练习(请注意字符类中的<-它需要成为/).

The unhandled cases 3, 4, 7 are not valid XML — I'm not convinced they're valid HTML (or XHTML) either. I believe they can be handled by a similar (but simpler) mechanism to the one shown here for the full <Name>…</Name> version. I'm leaving that as an exercise for the reader (beware the < in the character class — it would need to become a /).

/<Name>/! b
/<Name>.*<\/Name>/{
: l1
s/\(<Name>[[:space:]]*\(X[X[[:space:]]*\)\{0,1\}\)[^X<[:space:]]\(.*[[:space:]]*<\/Name>\)/\1X\3/
t l1
b
}
/<Name>/,/<\/Name>/{
  # Handle up to 4 lines to the end-name tag
  /<\/Name>/! N
  /<\/Name>/! N
  /<\/Name>/! N
  /<\/Name>/! N
# s/^/ZZ/; s/$/AA/p
# s/^ZZ//; s/AA$//
  : l2
  s/\(<Name>[[:space:]]*\(X[X[[:space:]]*\)\{0,1\}\)[^X<[:space:]]\(.*[[:space:]]*<\/Name>\)/\1X\3/
  t l2
}

第一行跳过"不包含<Name>的行的处理(它们将被打印并读取下一行).接下来的6行是初始提供"中的脚本,除了有一个b可以跳转到处理的结尾.

The first line 'skips' processing of lines not containing <Name> (they get printed and the next line is read). The next 6 lines are the script from the 'initial offering' except that there's a b to jump to the end of processing.

新部分是/<Name>/,/<\/Name>/代码.这将单独查找<Name>,并最多连接4行,直到在模式空间中包含</Name>.这两个注释行用于调试-它们使我能够看到被视为一个单元的内容.除了使用标签l2代替l1之外,其余部分与最初的产品完全相同-sed正则表达式已经可以容纳换行符.

The new section is the /<Name>/,/<\/Name>/ code. This looks for <Name> on its own, and concatenates up to 4 lines until a </Name> is included in the pattern space. The two comment lines were used for debugging — they allowed me to see what was being treated as a unit. Except for the use of the label l2 in place of l1, the remainder is exactly the same as in the initial offering — sed regexes already accommodate newlines.

这是重型sed脚本,不是我想要使用或维护的脚本.我会选择使用XML解析器的Perl解决方案(因为我比Python更了解Perl),但是Python也会通过使用适当的XML解析器来很好地完成工作.

This is heavy-duty sed scripting and not what I'd want to use or maintain. I would go with a Perl solution using an XML parser (because I know Perl better than Python), but Python would do the job fine too with an appropriate XML parser.

稍微扩展的数据文件.

<Name> Jason </Name>
<Name>Jim</Name>
<Name> Jason Bourne </Name>
<Name> Elijah </Name> <Name> Dennis </Name>
<Name> Elijah Wood </Name> <Name> Dennis The Menace </Name>
<Name>Elijah Wood</Name> <Name>Dennis The Menace</Name>
<Name> Jason
        </Name>
<Name>
    Jim</Name>
<Name>
    Jim
        </Name>
<Name> Jason
Bourne </Name>
<Name> 
    Jason
        Bourne
            </Name>
<Name> Elijah </Name>
<Name>
Dennis
</Name>
<Name> Elijah
Wood </Name>
            <Name> Dennis
The Menace </Name>
<Name>Elijah
Wood</Name>
    <Name>Dennis The
Menace</Name>



<Name> Jason </Name>
to
<Name> XXXXX </Name>

2. (see no space)

 <Name>Jim</Name>
 to
 <Name>XXX</Name>

3.

<!--Name Jason /--> 
to 
<!--Name XXXXX /-->`

4.

<!--Name Jas /-->
to
<!--Name XXX /-->

starting tag, value and closing tag can all come in different line

5.

<Name>Jim
</Name>
to
<Name>XXX
</Name>

6.

<Name>
     Jim
       </Name>
to
<Name>
     XXX
       </Name>

7.

  <!--Name
     Jim
       /-->
to
  <!--Name
     XXX
       /-->

8.

<Name> Jason </Name> <Name> Ignacio </Name>
to
<Name> XXXXX </Name> <Name> XXXXXX </Name>

9.

<Name> Jason Ignacio </Name>
to
<Name> XXXXX XXXXXXX </Name>
or
<Name> XXXXXXXXXXXXX </Name>

没有断言data文件包含最少的情况;这是重复的.它包括问题的材料,只是将诸如<Name Value />之类的非正统" XML元素转换为XML注释<!--Name Value /-->.映射实际上并不是至关重要的.开头部分与<Name>不匹配(尾巴与</Name>不匹配),因此无论如何都不会对其进行处理.

No claims are made that the data file contains a minimal set of cases; it is repetitious. It includes the material from the question, except that the 'unorthodox' XML elements like <Name Value /> are converted into XML comments <!--Name Value /-->. The mapping actually isn't crucial; the opening part doesn't match <Name> (and the tail doesn't match </Name>) so they'd not be processed anyway.

$ sed -f script.sed data
<Name> XXXXX </Name>
<Name>XXX</Name>
<Name> XXXXX XXXXXX </Name>
<Name> XXXXXX </Name> <Name> XXXXXX </Name>
<Name> XXXXXX XXXX </Name> <Name> XXXXXX XXX XXXXXX </Name>
<Name>XXXXXX XXXX</Name> <Name>XXXXXX XXX XXXXXX</Name>
<Name> XXXXX
        </Name>
<Name>
    XXX</Name>
<Name>
    XXX
        </Name>
<Name> XXXXX
XXXXXX </Name>
<Name> 
    XXXXX
        XXXXXX
            </Name>
<Name> XXXXXX </Name>
<Name>
XXXXXX
</Name>
<Name> XXXXXX
XXXX </Name>
            <Name> XXXXXX
XXX XXXXXX </Name>
<Name>XXXXXX
XXXX</Name>
    <Name>XXXXXX XXX
XXXXXX</Name>



<Name> XXXXX </Name>
to
<Name> XXXXX </Name>

2. (see no space)

 <Name>XXX</Name>
 to
 <Name>XXX</Name>

3.

<!--Name Jason /--> 
to 
<!--Name XXXXX /-->`

4.

<!--Name Jas /-->
to
<!--Name XXX /-->

starting tag, value and closing tag can all come in different line

5.

<Name>XXX
</Name>
to
<Name>XXX
</Name>

6.

<Name>
     XXX
       </Name>
to
<Name>
     XXX
       </Name>

7.

  <!--Name
     Jim
       /-->
to
  <!--Name
     XXX
       /-->

8.

<Name> XXXXX </Name> <Name> XXXXXXX </Name>
to
<Name> XXXXX </Name> <Name> XXXXXX </Name>

9.

<Name> XXXXX XXXXXXX </Name>
to
<Name> XXXXX XXXXXXX </Name>
or
<Name> XXXXXXXXXXXXX </Name>
$


首次发行

部分答案-但这说明了您面临的问题.处理案例1和在问题2中,加上多词变体,您可以使用以下脚本:


Initial offering

A partial answer — but it illustrates the problems you face. Dealing with cases 1 & 2 in the question, plus the multi-word variations, you can use the script:

/<Name>.*<\/Name>/{
: l1
s/\(<Name>[[:space:]]*\(X[X[[:space:]]*\)\{0,1\}\)[^X<[:space:]]\(.*[[:space:]]*<\/Name>\)/\1X\3/
t l1
}

礼貌地说,这是相当扭曲的.它查找<Name>,后跟零个或多个空格.后面可以跟\(X[X[[:space:]]*\)\{0,1\},这表示X出现零次或一次,后跟一系列X或空格.所有这些在替换中均捕获为\1.然后有一个不是X<或空格的字符,后跟零个或多个任何字符,零个或多个空格和</Name>.中间的单个字符被X替换.重复整个替换,直到通过标签: l1和条件分支t l1不再匹配为止.所有这些仅在同时包含<Name></Name>的行上运行.

That is pretty contorted, to be polite about it. It looks for <Name> followed by zero or more spaces. That can be followed by \(X[X[[:space:]]*\)\{0,1\}, which means zero or one occurrences of an X followed by a sequence of X's or spaces. All of that is captured as \1 in the replacement. Then there's a single character that isn't an X, < or space, followed by zero or more any characters, zero or more spaces, and </Name>. The single character in the middle is replaced by an X. The whole replacement is repeated until there are no more matches via the label : l1 and the conditional branch t l1. All that operates only on a line with both <Name> and </Name>.

<Name> Jason </Name>
<Name>Jim</Name>
<Name> Jason Bourne </Name>
<Name> Elijah </Name> <Name> Dennis </Name>
<Name> Elijah Wood </Name> <Name> Dennis The Menace </Name>
<Name>Elijah Wood</Name> <Name>Dennis The Menace</Name>
<Name> Jason
</Name>
<Name>
Jim</Name>
<Name> Jason
Bourne </Name>
<Name> Elijah </Name> <Name> Dennis
</Name>
<Name> Elijah
Wood </Name> <Name> Dennis
The Menace </Name>
<Name>Elijah
Wood</Name> <Name>Dennis The
Menace</Name>

输出

$ sed -f script.sed data
<Name> XXXXX </Name>
<Name>XXX</Name>
<Name> XXXXX XXXXXX </Name>
<Name> XXXXXX </Name> <Name> XXXXXX </Name>
<Name> XXXXXX XXXX </Name> <Name> XXXXXX XXX XXXXXX </Name>
<Name>XXXXXX XXXX</Name> <Name>XXXXXX XXX XXXXXX</Name>
<Name> Jason
</Name>
<Name>
Jim</Name>
<Name> Jason
Bourne </Name>
<Name> XXXXXX </Name> <Name> Dennis
</Name>
<Name> Elijah
Wood </Name> <Name> Dennis
The Menace </Name>
<Name>Elijah
Wood</Name> <Name>Dennis The
Menace</Name>
$

请注意直到最后的替换部分.那条线将引起更多的头痛.

Note the replacement part way through the end. That line is going to cause headaches for anything more.

我还没有弄清楚该脚本如何处理各种分割行的情况,除此之外,几乎可以肯定,在抓到</Name>之前,都需要加入行.然后,它将进行与已显示的处理紧密相关的处理,但需要在匹配的材料中包含换行符.

I've not worked out how the script would handle the various split-line cases, beyond it would almost certainly need to join lines until the </Name> is caught. It would then do processing closely related to that already shown, but it would need to allow for newlines in the matched material.

这篇关于用字符X在两个字符串之间更改文件中的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆