在 sed 中使用通配符 [英] Using wildcards with sed

查看：79 发布时间：2021/7/17 21:07:52 regex sed

本文介绍了在 sed 中使用通配符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个日志文件，它在普通 STDOUT 中嵌入了 xml，如下所示:

I have a log file that has embedded xml amongst normal STDOUT in it as follows:

2015-05-06 04:07:37.386 [INFO]Process:102 - Application submitted Successfully ==== 1
<APPLICATION><FirstName>Test</FirstName><StudentSSN>123456789</StudentSSN><Address>123 Test Street</Address><ParentSSN>123456780</ParentSSN><APPLICATIONID>2</APPLICATIONID></APPLICATION>
2015-05-06 04:07:39.386 [INFO] Process:103 - Application completed Successfully ==== 1
2015-05-06 04:07:37.386 [INFO]Process:104 - Application submitted Successfully ==== 1
<APPLICATION><FirstName>Test2</FirstName><StudentSSN>323456789</StudentSSN><Address>234 Test Street</Address><ParentSSN>123456780</ParentSSN><APPLICATIONID>2</APPLICATIONID></APPLICATION>
2015-05-06 04:07:39.386 [INFO] Process:105 - Application completed Successfully ==== 1

根据中提供给我的解决方案，我成功解析了它使用嵌入的 xml 解析和操作日志文件.根据那里的帖子，我使用 .sed 文件和如下命令:

which I am successfully parsing as per a solution provided to me in Parsing and manipulating log file with embedded xml . As per the post there, I use a .sed file with commands as follows:

s|<FirstName>[^<]*</FirstName>|<FirstName>***</FirstName>|
s|<StudentSSN>[^<]*</StudentSSN>|<StudentSSN>***</StudentSSN>|
s|<Address>[^<]*</Address>|<Address>***</Address>|
s|<ParentSSN>[^<]*</ParentSSN>|<ParentSSN>***</ParentSSN>|

我的问题是，有没有办法在上面的 foo.sed 文件中进行通配符匹配?因此，例如，如果我想匹配所有 *SSN 标签并用 ** 替换它们，而不是将一行用于 StudentSSN，另一行用于 ParentSSN，并且仍然产生如下输出:

My question is, is there a way to do a wild card match in the foo.sed file you have up above? So for example, if I wanted to match all *SSN tags and replace those with a **, rather than have one line for StudentSSN and another for ParentSSN and still yield the output as below:

2015-05-06 04:07:37.386 [INFO]Process:102 - Application submitted Successfully ==== 1
<APPLICATION><FirstName>***</FirstName><StudentSSN>***</StudentSSN><Address>*******</Address><ParentSSN>*********</ParentSSN>   <APPLICATIONID>2</APPLICATIONID></APPLICATION>
2015-05-06 04:07:39.386 [INFO] Process:103 - Application completed Successfully ==== 1
2015-05-06 04:07:37.386 [INFO]Process:104 - Application submitted Successfully ==== 1
<APPLICATION><FirstName>***</FirstName><StudentSSN>*********</StudentSSN><Address>*****</Address><ParentSSN>*********</ParentSSN>   <APPLICATIONID>2</APPLICATIONID></APPLICATION>
2015-05-06 04:07:39.386 [INFO] Process:105 - Application completed Successfully ==== 1

提前致谢

推荐答案

choroba 的有用回答与 配合良好GNU sed，因为在 basic 正则表达式中使用 \| 进行交替(暗示缺少 -r 选项)仅在那里受支持.

choroba's helpful answer works well with GNU sed, because using \| for alternation in a basic regular expression (implied by the absence of the -r option) is only supported there.

此外，OP 表示希望使用模式来匹配相似元素名称.

Also, the OP has since expressed a desire to use patterns to match similar element names.

这是一个使用扩展正则表达式的解决方案，它应该适用于 Linux (GNU Sed) 和 BSD/OSX 平台 (BSD Sed)::>

Here's a solution that makes uses of extended regular expressions, which should work on both Linux (GNU Sed) and BSD/OSX platforms (BSD Sed):

sed -E 's%<([^>]*Name|[^>]*SSN|Address[^>]*)>[^<]*%<\1>***%g' file

注意:

使用[^>]*而不是.*来匹配元素名称的可变部分是很重要的，以确保匹配仅限于开始标记.
BSD/OSX extended 正则表达式(与 POSIX 扩展正则表达式一致)不支持 在正则表达式本身内部的反向引用(与反向引用"相反)引用替换字符串中的捕获组匹配项)，因此不会尝试将结束标记与一个匹配.
虽然此命令在指定的平台上有效，但它不符合 POSIX，因为 POSIX 只要求支持 Sed 中的基本正则表达式.

It is import to match the variable parts of the element names with [^>]* rather than .* so as to ensure that the matches remain confined to the opening tag.
BSD/OSX extended regular expressions (in accordance with POSIX extended regular expressions) do not support backreferences inside the regular expression itself (as opposed to the "backreferences" that refer to capture-group matches in the replacement string), so no attempt is made to match the closing tag with one.
While this command works on the stated platforms, it is not POSIX-compliant, because POSIX only mandates support for basic regular expressions in Sed.

上述命令是以下 GNU Sed 命令的 e等效项，使用基本正则表达式 - 注意需要转义 (、) 和 |:

The above command is the equivalent of the following GNU Sed command using a basic regular expression - note the need to escape (, ), and |:

sed 's%<\([^>]*Name\|[^>]*SSN\|Address[^>]*\)>[^<]*%<\1>***%g' file

请注意，使用交替 (\|) 使该命令不可可移植，因为 POSIX 基本正则表达式可以不支持.

Note, that it is the use of alternation (\|) that makes this command not portable, because POSIX basic regular expressions do not support it.

这篇关于在 sed 中使用通配符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在 sed 中使用通配符 [英] Using wildcards with sed

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在 sed 中使用通配符 [英] Using wildcards with sed

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭