在带有字符 X 的两个字符串之间更改文件中的字符串 [英] change string in file between two strings with character X

查看:9
本文介绍了在带有字符 X 的两个字符串之间更改文件中的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用相同数量的 X 替换标签之间的值.例如

I wanna replace value between a tag by equal number of X. For example

1.

<Name> Jason </Name>
to
<Name> XXXXX </Name>

2.(看不到空间)

 <Name>Jim</Name>
 to
 <Name>XXX</Name>

3.

<Name Jason /> 
to 
<Name XXXXX />`

4.

<Name Jas />
to
<Name XXX />

起始标签、值和结束标签都可以在不同的行

starting tag, value and closing tag can all come in different line

5.

<Name>Jim
</Name>
to
<Name>XXX
</Name>

6.

<Name>
     Jim
       </Name>
to
<Name>
     XXX
       </Name>

7.

  <Name
     Jim
       />
to
  <Name
     XXX
       />

8.

<Name> Jason </Name> <Name> Ignacio </Name>
to
<Name> XXXXX </Name> <Name> XXXXXX </Name>

9.

<Name> Jason Ignacio </Name>
to
<Name> XXXXX XXXXXXX </Name>
or
<Name> XXXXXXXXXXXXX </Name>

两个都很好

我试过了,没用

file=mylog.log
search_str="<Name>"
end_str="</Name>"
sed -i -E ':a; s/('"$search_str"'X*)[^X'"$end_str"']/1X/; ta' "$file"

请让我知道如何在 bash 脚本中执行此操作....

Please let me know how to do this in bash script....

更新:

我也试过这个,但在 6 和 7 种情况下都不起作用.案例 1 到 5 有效.

I tried this also, but didn't worked for 6 and 7 cases. case 1 to 5 worked.

sed -i -E '/<Name>/{:a; /</Name>/bb; n; ba; :b; s/(<Name>X*)[^X<]/1X/; tb; }' "$file"
sed -i -E '/<Name[[:space:]]/{:a; //>/bb; n; ba; :b; s/(<Name[[:space:]]X*)[^X/]/1X/; tb; }' "$file"

推荐答案

临时解决方案

这扩展了下面的初始提供"并处理情况 1、2、5、6、8、9.它不处理存在一个或多个完整 ...</Name> 条目以及在同一行上没有匹配 的起始 .坦率地说,我什至不确定如何开始处理这种情况.

Provisional solution

This extends the 'initial offering' below and handles cases 1, 2, 5, 6, 8, 9. It does not handle the case where there is one or more complete <Name>…</Name> entries and also a starting <Name> without the matching </Name> on the same line. Frankly, I'm not even sure how to start tackling that scenario.

未处理的情况 3、4、7 不是有效的 XML — 我也不相信它们是有效的 HTML(或 XHTML).我相信它们可以通过与此处显示的完整 ...</Name> 版本类似(但更简单)的机制来处理.我将把它留给读者作为练习(注意字符类中的 < - 它需要变成 /).

The unhandled cases 3, 4, 7 are not valid XML — I'm not convinced they're valid HTML (or XHTML) either. I believe they can be handled by a similar (but simpler) mechanism to the one shown here for the full <Name>…</Name> version. I'm leaving that as an exercise for the reader (beware the < in the character class — it would need to become a /).

/<Name>/! b
/<Name>.*</Name>/{
: l1
s/(<Name>[[:space:]]*(X[X[[:space:]]*){0,1})[^X<[:space:]](.*[[:space:]]*</Name>)/1X3/
t l1
b
}
/<Name>/,/</Name>/{
  # Handle up to 4 lines to the end-name tag
  /</Name>/! N
  /</Name>/! N
  /</Name>/! N
  /</Name>/! N
# s/^/ZZ/; s/$/AA/p
# s/^ZZ//; s/AA$//
  : l2
  s/(<Name>[[:space:]]*(X[X[[:space:]]*){0,1})[^X<[:space:]](.*[[:space:]]*</Name>)/1X3/
  t l2
}

第一行跳过"不包含 的行的处理(它们被打印出来并读取下一行).接下来的 6 行是来自初始产品"的脚本,除了有一个 b 跳转到处理结束.

The first line 'skips' processing of lines not containing <Name> (they get printed and the next line is read). The next 6 lines are the script from the 'initial offering' except that there's a b to jump to the end of processing.

新的部分是 //,// 代码.这会自行查找 ,并连接最多 4 行,直到 包含在模式空间中.两个注释行用于调试——它们让我可以看到被视为一个单元的内容.除了使用标签 l2 代替 l1 之外,其余部分与最初提供的完全相同——sed 正则表达式已经适应换行.

The new section is the /<Name>/,/</Name>/ code. This looks for <Name> on its own, and concatenates up to 4 lines until a </Name> is included in the pattern space. The two comment lines were used for debugging — they allowed me to see what was being treated as a unit. Except for the use of the label l2 in place of l1, the remainder is exactly the same as in the initial offering — sed regexes already accommodate newlines.

这是重型 sed 脚本,而不是我想要使用或维护的.我会选择使用 XML 解析器的 Perl 解决方案(因为我比 Python 更了解 Perl),但 Python 也可以使用适当的 XML 解析器来完成这项工作.

This is heavy-duty sed scripting and not what I'd want to use or maintain. I would go with a Perl solution using an XML parser (because I know Perl better than Python), but Python would do the job fine too with an appropriate XML parser.

稍微扩展的数据文件.

<Name> Jason </Name>
<Name>Jim</Name>
<Name> Jason Bourne </Name>
<Name> Elijah </Name> <Name> Dennis </Name>
<Name> Elijah Wood </Name> <Name> Dennis The Menace </Name>
<Name>Elijah Wood</Name> <Name>Dennis The Menace</Name>
<Name> Jason
        </Name>
<Name>
    Jim</Name>
<Name>
    Jim
        </Name>
<Name> Jason
Bourne </Name>
<Name> 
    Jason
        Bourne
            </Name>
<Name> Elijah </Name>
<Name>
Dennis
</Name>
<Name> Elijah
Wood </Name>
            <Name> Dennis
The Menace </Name>
<Name>Elijah
Wood</Name>
    <Name>Dennis The
Menace</Name>



<Name> Jason </Name>
to
<Name> XXXXX </Name>

2. (see no space)

 <Name>Jim</Name>
 to
 <Name>XXX</Name>

3.

<!--Name Jason /--> 
to 
<!--Name XXXXX /-->`

4.

<!--Name Jas /-->
to
<!--Name XXX /-->

starting tag, value and closing tag can all come in different line

5.

<Name>Jim
</Name>
to
<Name>XXX
</Name>

6.

<Name>
     Jim
       </Name>
to
<Name>
     XXX
       </Name>

7.

  <!--Name
     Jim
       /-->
to
  <!--Name
     XXX
       /-->

8.

<Name> Jason </Name> <Name> Ignacio </Name>
to
<Name> XXXXX </Name> <Name> XXXXXX </Name>

9.

<Name> Jason Ignacio </Name>
to
<Name> XXXXX XXXXXXX </Name>
or
<Name> XXXXXXXXXXXXX </Name>

没有声明 data 文件包含最少的案例集;它是重复的.它包括问题中的材料,除了像 <Name Value/> 这样的非正统"XML 元素被转换为 XML 注释 ;.映射实际上并不重要;开头部分与 不匹配(并且尾部不匹配 </Name>),因此无论如何它们都不会被处理.>

输出

No claims are made that the data file contains a minimal set of cases; it is repetitious. It includes the material from the question, except that the 'unorthodox' XML elements like <Name Value /> are converted into XML comments <!--Name Value /-->. The mapping actually isn't crucial; the opening part doesn't match <Name> (and the tail doesn't match </Name>) so they'd not be processed anyway.

$ sed -f script.sed data
<Name> XXXXX </Name>
<Name>XXX</Name>
<Name> XXXXX XXXXXX </Name>
<Name> XXXXXX </Name> <Name> XXXXXX </Name>
<Name> XXXXXX XXXX </Name> <Name> XXXXXX XXX XXXXXX </Name>
<Name>XXXXXX XXXX</Name> <Name>XXXXXX XXX XXXXXX</Name>
<Name> XXXXX
        </Name>
<Name>
    XXX</Name>
<Name>
    XXX
        </Name>
<Name> XXXXX
XXXXXX </Name>
<Name> 
    XXXXX
        XXXXXX
            </Name>
<Name> XXXXXX </Name>
<Name>
XXXXXX
</Name>
<Name> XXXXXX
XXXX </Name>
            <Name> XXXXXX
XXX XXXXXX </Name>
<Name>XXXXXX
XXXX</Name>
    <Name>XXXXXX XXX
XXXXXX</Name>



<Name> XXXXX </Name>
to
<Name> XXXXX </Name>

2. (see no space)

 <Name>XXX</Name>
 to
 <Name>XXX</Name>

3.

<!--Name Jason /--> 
to 
<!--Name XXXXX /-->`

4.

<!--Name Jas /-->
to
<!--Name XXX /-->

starting tag, value and closing tag can all come in different line

5.

<Name>XXX
</Name>
to
<Name>XXX
</Name>

6.

<Name>
     XXX
       </Name>
to
<Name>
     XXX
       </Name>

7.

  <!--Name
     Jim
       /-->
to
  <!--Name
     XXX
       /-->

8.

<Name> XXXXX </Name> <Name> XXXXXXX </Name>
to
<Name> XXXXX </Name> <Name> XXXXXX </Name>

9.

<Name> XXXXX XXXXXXX </Name>
to
<Name> XXXXX XXXXXXX </Name>
or
<Name> XXXXXXXXXXXXX </Name>
$

<小时>

首次发售

部分答案——但它说明了您面临的问题.处理案例 1 &问题中的2,加上多词变体,就可以用脚本了:


Initial offering

A partial answer — but it illustrates the problems you face. Dealing with cases 1 & 2 in the question, plus the multi-word variations, you can use the script:

/<Name>.*</Name>/{
: l1
s/(<Name>[[:space:]]*(X[X[[:space:]]*){0,1})[^X<[:space:]](.*[[:space:]]*</Name>)/1X3/
t l1
}

这是相当扭曲的,礼貌地说.它查找 后跟零个或多个空格.后面可以跟 (X[X[[:space:]]*){0,1},这意味着 X 出现零次或一次,后跟 X 的序列或空格.所有这些都在替换中被捕获为 1 .然后有一个不是 X< 或空格的单个字符,后跟零个或多个任意字符、零个或多个空格,以及 </名称>.中间的单个字符被 X 替换.重复整个替换,直到通过标签 : l1 和条件分支 t l1 不再有匹配项为止.所有仅在同时具有 的行上运行.

That is pretty contorted, to be polite about it. It looks for <Name> followed by zero or more spaces. That can be followed by (X[X[[:space:]]*){0,1}, which means zero or one occurrences of an X followed by a sequence of X's or spaces. All of that is captured as 1 in the replacement. Then there's a single character that isn't an X, < or space, followed by zero or more any characters, zero or more spaces, and </Name>. The single character in the middle is replaced by an X. The whole replacement is repeated until there are no more matches via the label : l1 and the conditional branch t l1. All that operates only on a line with both <Name> and </Name>.

<Name> Jason </Name>
<Name>Jim</Name>
<Name> Jason Bourne </Name>
<Name> Elijah </Name> <Name> Dennis </Name>
<Name> Elijah Wood </Name> <Name> Dennis The Menace </Name>
<Name>Elijah Wood</Name> <Name>Dennis The Menace</Name>
<Name> Jason
</Name>
<Name>
Jim</Name>
<Name> Jason
Bourne </Name>
<Name> Elijah </Name> <Name> Dennis
</Name>
<Name> Elijah
Wood </Name> <Name> Dennis
The Menace </Name>
<Name>Elijah
Wood</Name> <Name>Dennis The
Menace</Name>

输出

$ sed -f script.sed data
<Name> XXXXX </Name>
<Name>XXX</Name>
<Name> XXXXX XXXXXX </Name>
<Name> XXXXXX </Name> <Name> XXXXXX </Name>
<Name> XXXXXX XXXX </Name> <Name> XXXXXX XXX XXXXXX </Name>
<Name>XXXXXX XXXX</Name> <Name>XXXXXX XXX XXXXXX</Name>
<Name> Jason
</Name>
<Name>
Jim</Name>
<Name> Jason
Bourne </Name>
<Name> XXXXXX </Name> <Name> Dennis
</Name>
<Name> Elijah
Wood </Name> <Name> Dennis
The Menace </Name>
<Name>Elijah
Wood</Name> <Name>Dennis The
Menace</Name>
$

注意替换部分到最后.这条线会引起更多的头痛.

Note the replacement part way through the end. That line is going to cause headaches for anything more.

我还没有弄清楚脚本如何处理各种分割线的情况,除了它几乎肯定需要连接线,直到 </Name> 被捕获.然后它会进行与已经显示的内容密切相关的处理,但它需要在匹配的材料中允许换行.

I've not worked out how the script would handle the various split-line cases, beyond it would almost certainly need to join lines until the </Name> is caught. It would then do processing closely related to that already shown, but it would need to allow for newlines in the matched material.

这篇关于在带有字符 X 的两个字符串之间更改文件中的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆