使用 Perl XML::SAX 修改 XML 文档 [英] Using Perl XML::SAX to modify XML documents

查看:49
本文介绍了使用 Perl XML::SAX 修改 XML 文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 XML::SAX 来修改 XHTML 文档的一部分,但是我所有的尝试都失败了.

I'm trying to use XML::SAX to modify parts of an XHTML document, however all my attempts have failed.

这是我想要做的:

#!/usr/bin/perl 
package MyHandler;
use strict;
use warnings;

use base qw(XML::SAX::Base);
use Data::Dumper;

sub start_element {
    my $self = shift;
    my $data = shift;

    if( $data->{LocalName} eq 'span') {
        $data->{LocalName} = 'naps';
    }

    $self->SUPER::start_element($data); # GOOD (and easy) !
    #print Dumper($data); 
}

1;

#============================
#Main programm
#============================
use strict;
use warnings;

use XML::SAX::ParserFactory;
use XML::SAX::Writer;

my $out;

my $o = XML::SAX::Writer->new( Output => \$out );
my $h = MyHandler->new( Handler => $o );
my $p = XML::SAX::ParserFactory->parser(Handler => $h);

my $data;
{ local undef $/ }; $data = <DATA>;
$p->parse_string( $data );
print $out;


__DATA__
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:wicket="http://wicket.apache.org/dtds.data/wicket-xhtml1.4-strict.dtd">
<body>
<wicket:panel>
    <form wicket:id="mvpForm">
        <span>Edit Information: </span>
        <input type="checkbox" wicket:id="editForm"/>

        <span>Name: </span>
        <span wicket:id="name"></span>
        <input type="text" wicket:id="nameEdit"/>

        <span>Last Name: </span>
        <span wicket:id="lastName"></span>
        <input type="text" wicket:id="lastNameEdit"/>

        <span>DOB: </span>
        <span wicket:id="dob"></span>
        <input type="text" wicket:id="dobEdit"/>


        <span>Occupation: </span>
        <span wicket:id="occupation"></span>
        <input type="text" wicket:id="occupationEdit"/>


        <span>Gender: </span>
        <span wicket:id="gender"></span>
        <span wicket:id="genderEdit"/>

        <input type="submit" wicket:id="submit"/>

    </form>
</wicket:panel>
</body>
</html> 

基本思想是将每个跨度"更改为小睡",并将生成的修改后的 XML 写入 STDOUT.

The basic idea is to change every "span" to a "naps" and write the resulting modified XML to STDOUT.

此外,很高兴看看是否可以使用 SAX 合并 xml 块,换句话说,如果我发现一个特定元素被扩展为其他元素,我如何将它与输出到 STDOUT 合并?

Also, it'd be nice to see if its possible to merge xml chunks using SAX, in other words, if I found a particular element that gets expanded to something else, how can I merge it with the output going to STDOUT?

例如来自:

<xmltag>
    <expandable/>
</xmltag>

致:

<xmltag>
    <expanded>
        This is an expanded element
    </expanded>
</xmltag>

谢谢.

推荐答案

为了回答我自己关于合并/扩展元素的问题,这里有一个关于如何使用 sax 进行操作的片段:

To answer my own question regarding merging/expanding elements, here is a snippet on how to do it with sax:

#!/usr/bin/perl 
package MyHandler;
use strict;
use warnings;

use base qw(XML::SAX::Base);
use Data::Dumper;

use XML::SAX::ParserFactory;
use XML::SAX::Writer;

sub start_element {
    my $self = shift;
    my $data = shift;

    if( $data->{LocalName} eq 'expand') {
        $self->{in_include}++;
        my $p = XML::SAX::ParserFactory->parser( Handler => $self );
        $p->parse_string( "<expanded>This is my expanded tag</expanded>" );
        return;
    }

    #$data->{Attributes} = undef;
    $self->SUPER::start_element($data);
    #print Dumper($data); 
}

sub characters {
    my $self = shift;
    my $data = shift;

    #print "Data is $data->{Data}" if defined $data->{Data}; 
    $self->SUPER::characters($data);
}

sub end_element {
    my ($self, $element) = @_;
    if ($element->{LocalName} eq "expand") {
        $self->{in_include}--;
    } else {
        $self->SUPER::end_element($element);
    }
}

sub start_document { # same for end_document
    my($self, $data) = @_;
    return if($self->{in_include});
    $self->SUPER::start_document($data);
}

sub end_document { # same for end_document
    my($self, $data) = @_;
    return if($self->{in_include});
    $self->SUPER::end_document($data);
}

1;

#============================
#Main programm
#============================
use strict;
use warnings;

use XML::SAX::ParserFactory;
use XML::SAX::Writer;

my $out;

my $o = XML::SAX::Writer->new( Output => \$out );
my $h = MyHandler->new( Handler => $o );
my $p = XML::SAX::ParserFactory->parser(Handler => $h);

my $data;
{ local undef $/ }; $data = <DATA>;
$p->parse_string( $data );
print $out;


__DATA__
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:wicket="http://wicket.apache.org/dtds.data/wicket-xhtml1.4-strict.dtd">
<body>
<wicket:panel>
    <form wicket:id="mvpForm">
        <span>Edit Information: </span>
        <input type="checkbox" wicket:id="editForm"/>

        <span>Name: </span>
        <span wicket:id="name"></span>
        <input type="text" wicket:id="nameEdit"/>

        <span>Last Name: </span>
        <span wicket:id="lastName"></span>
        <input type="text" wicket:id="lastNameEdit"/>

        <span>DOB: </span>
        <span wicket:id="dob"></span>
        <input type="text" wicket:id="dobEdit"/>

        <span>Occupation: </span>
        <span wicket:id="occupation"></span>
        <input type="text" wicket:id="occupationEdit"/>

        <span>Gender: </span>
        <span wicket:id="gender"></span>
        <span wicket:id="genderEdit"/>

        <input type="submit" wicket:id="submit"/>

        <expand/>

    </form>
</wicket:panel>
</body>
</html> 

<expand/> 标签将被替换为 .

基本上所有需要的是创建一个新的解析器并将其传递给要解析的文件/字符串.但是,请注意有几个问题.第一个是在您拦截要展开的标签的地方停止传播事件.换句话说,不要在扩展/嵌套标签时调用 $self->SUPER::start/end_element ,这将阻止替换的标签出现在输出中.其次,需要拦截start_document/end_document,跳过调用parent,否则会报错:

Basically all is needed is to create a new parser and hand it a file/string to be parsed. However, note that there are a couple of gotchas. The first one is to stop propagating the event where you have intercepted the tag to be expanded. In other words don't call $self->SUPER::start/end_element whenever expanding/nesting tags, that will prevent the replaced tag to end up in the output. Second, it's required to intercept start_document/end_document and skip calling the parent for those ones, otherwise the following error will be produced:

尝试在/usr/share/perl5/XML/NamespaceSupport.pm 第 79 行,块 1 不推送上下文的情况下弹出上下文.

Trying to pop context without push context at /usr/share/perl5/XML/NamespaceSupport.pm line 79, chunk 1.

换句话说,一些清理失败了:

In other words some clean up fails:

触发此消息是因为 XML::NamespaceSupport 对 start_document 事件进行了一些初始化,并对 end_document 事件进行了一些清理.问题是,对于您的代码,主文档会有一对这样的事件,每个包含的文档会有一对嵌套.当第二个 end_document 事件发生时,没有什么要清理的——因此是消息.摘自此处

This message is being triggered because XML::NamespaceSupport does some initialisation on a start_document event and some cleanup on an end_document event. The problem is that with your code there will be a pair of these events for the main document and a nested pair for each included document. When the second end_document event occurs, there is nothing to clean up - hence the message. Taken from here

这篇关于使用 Perl XML::SAX 修改 XML 文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆