从 XML 标签中删除空格 [英] Remove whitespace from XML tags

查看:39
本文介绍了从 XML 标签中删除空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个 perl 脚本,该脚本从 XML 标记中删除空格,但在值内留下空格.例如,假设我有:

I'm trying to write a perl script that removes whitespace from XML tags, but leaves whitespace inside of the values. For example, let's say I have:

<Example>This is an example.</Exampl   e>

我想要完成的是去除</Exampl e>中的空白.由于这将适用于整个 XML 文档,我想我会用替换运算符做一些事情,但我无法弄清楚如何只匹配可能位于 XML 标签本身内部的空白.

What I'm looking to accomplish is to knock off the whitespace specifically in </Exampl e>. Since this will be working on an entire XML document, I figured I'd do something with the substitution operator, but I can't quite figure out how to only match whitespace that might be inside of the XML tags themselves.

非常感谢任何帮助!

我添加了一个真实的例子来说明正在发生的事情:

I've added a real example of what is occurring:

not well-formed (invalid token) at line 42, column 25, byte 1456:
                    <Artist>Eminem</Artist>
                    <FileName>eminem feat lil wayne - no love -
hotnewhiphop com(2).mp3</    FileName>
========================^
                    <FileSize>4804478</FileSize>

推荐答案

s!(</?\w+)\s+(\w+\s+/?>)!$1$2!g;

如果你真的想在带有属性的标签中留下空格,它会变得更加复杂,因为空格是标签中的合法字符.您几乎必须找到后面没有等号或空格 + 等号的单词",并将它们与前一个--未加引号的--单词结合起来.

If you want to actually leave whitespace in a tag with attributes, it gets more complex, because whitespace is a legitimate character in a tag. You pretty much have to find the "words" with no equals or space + equals after them and marry them to the previous--unquoted--word.

sub marry_inner_splits {
    my $_ = shift;
    # fix broken tags
    s|^/?(\w+)\s+(\w+)\b(?!\s*=)|$1$2|; 
    # find the resulting position.
    my $pos = index( $_, ' ' );
    # return if there is no whitespace.
    return $_ if $pos == -1;
    # bind the rest of the text to the substring
    substr( $_, $pos ) =~ s/(\s*\w+)\s+(\w+\s*=\s*(?:"[^"]+"|'[^']+')\s*)/$1$2/g;
    return $_;
}

my $tag_str = q{Some stuff before the tag <ta g attr1="val1" att   r2="value #2"     /></Escap   e>};
$tag_str =~ s/<([^>]+)>/'<' . marry_inner_splits($1) . '>'/ge;

e 标志意味着您在替换部分*eval*-ing.

The e flag means that you are*eval*-ing in the replacement part.

这篇关于从 XML 标签中删除空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆