在bash中使用Regex删除XML注释 [英] Remove XML comments using Regex in bash

查看:57
本文介绍了在bash中使用Regex删除XML注释的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用正则表达式(awk,sed,grep ...)删除bash中的XML注释. 我已经看过其他有关此问题,但他们缺少了一些东西.这是我的xml代码

I want to remove XML comments in bash using regex (awk, sed, grep...) I have looked at other questions about this but they are missing something. Here's my xml code

<Table>
    <!--
   to be removed bla bla bla bla bla bl............

    removeee

    to be removeddddd
    -->

<row>
        <column name="example"  value="1" ></column>
    </row>
</Table>

因此,我正在比较2个xml文件,但我不希望比较考虑注释.我这样做

So I'm comparing 2 xml files but I don't want the comparison to take into account the comments. I do this

diff file1.xml file2.xml | sed '/<!--/,/-->/d'

,但这只会删除以<!--开头的行和最后一行.它不会删除它们之间的所有行.

but that only removes the line that starts with <!-- and the last line. It does not remove all the lines in between.

推荐答案

最后,您将不得不向您的客户/朋友/讲师建议他们需要安装某种XML处理器. xmlstarlet是一个很好的命令行工具,但是可以为任何标准Unix以及大多数情况下的Windows编译任何数量(或至少一些数量大于2的XSLT实现).您实际上无法使用基于正则表达式的工具进行大量的XML处理,并且您所做的任何事情都将难以阅读,难以维护,并且在极端情况下可能会失败,有时会带来灾难性的后果.

In the end, you're going to have to recommend to your client/friend/instructor that they need to install some kind of XML processor. xmlstarlet is a good command line tool, but there are any number (or at least some number greater than 2) of implementations of XSLT which can be compiled for any standard Unix, and in most cases also for Windows. You really cannot do much XML processing with regex-based tools, and whatever you do will be hard to read, harder to maintain, and likely to fail on corner cases, sometimes with disastrous consequences.

我没有花很多时间来完善或审查以下awk程序.我认为它将从兼容的xml文档中删除注释.请注意,以下注释是兼容的

I haven't spent a lot of time polishing or reviewing the following little awk program. I think it will remove comments from compliant xml documents. Note that the following comment is not compliant:

<!-- XML comments cannot include -- so this comment is illegal -->

并且我的脚本无法正确处理它.

and it will not be treated correctly by my script.

以下内容也是违法的,但是由于我在野外看到了它并且并不难处理,所以我这样做了.

The following is also illegal, but since I've seen it in the wild and it wasn't hard to deal with, I did so:

<!-------------- This comment is ill-formed but... -------------->

是的.没有保证.我知道很难阅读,也不想维护它.在任意的极端情况下,它很可能会失败.

Here it is. No guarantees. I know that it's hard to read, and I wouldn't want to maintain it. It may well fail on arbitrary corner cases.

awk 'in_comment&&/-->/{sub(/([^-]|-[^-])*--+>/,"");in_comment=0}
     in_comment{next}
     {gsub(/<!--+([^-]|-[^-])*--+>/,"");
      in_comment=sub(/<!--+.*/,"");
      print}'

这篇关于在bash中使用Regex删除XML注释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆