想要根据标签拆分 UNIX xml 文件 [英] Want to split an UNIX xml file based on tags

查看:34
本文介绍了想要根据标签拆分 UNIX xml 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含如下批次的 XML 文件.

I've a XML file with batches like below.

我想使用 shell 脚本根据标签将此文件拆分为 5 个文件.请帮忙,提前致谢.

I want to split this file into 5 files based on the tags using shell scripting. Please help, thanks in advance.

<Items>
<Item>
<Title>Title 1</Title>
<DueDate>01-02-2008</DueDate>
</Item>
<Item>
<Title>Title 2</Title>
<DueDate>01-02-2009</DueDate>
</Item>
<Item>
<Title>Title 3</Title>
<DueDate>01-02-2010</DueDate>
</Item>
<Item>
<Title>Title 4</Title>
<DueDate>01-02-2011</DueDate>
</Item>
<Item>
<Title>Title 5</Title>
<DueDate>01-02-2012</DueDate>
</Item>
</Items>

所需的输出:

<Items>
<Item>
<Title>Title 1</Title>
<DueDate>01-02-2008</DueDate>
</Item>
</Items>

推荐答案

我建议 - 安装 XML::Twig 包括相当方便的 xml_split 实用程序.这可能会做你需要的.例如:

I would suggest - install XML::Twig which includes the rather handy xml_split utility. That may do what you need. E.g.:

xml_split -c Item

但是,我认为您要完成的任务并不容易,因为您要拆分并保留 XML 结构.您不能使用基于标准行/正则表达式的工具来做到这一点.

However I'd offer what you're trying to accomplish isn't amazingly easy, because you're trying to cut up and retain the XML structure. You can't do it with standard line/regex based tools.

但是你可以使用解析器:

#!/usr/bin/env perl

use strict;
use warnings;
use XML::Twig;

my @item_list;

sub cut_item {
    my ( $twig, $item ) = @_;
    my $thing = $item->cut;
    push( @item_list, $thing );

}

my $twig = XML::Twig->new(
    twig_handlers => { 'Item' => \&cut_item }
);
$twig->parse(<>);

my $itemcount = 1;

foreach my $element (@item_list) {
    my $newdoc = XML::Twig->new( 'pretty_print' => 'indented_a' );
    $newdoc->set_root( XML::Twig::Elt->new('Items') );

    $element->paste( $newdoc->root );
    $newdoc->print;
    open( my $output, ">", "items_" . $itemcount++ . ".xml" );
    print {$output} $newdoc->sprint;
    close($output);
}

这使用 XML::Twig 库从您的 XML 中提取每个 Item 元素(通过 STDIN 或通过 myscript.pl yourfilename).

This uses the XML::Twig library to extract each of the Item elements from your XML (piped on STDIN, or via myscript.pl yourfilename).

然后迭代它找到的所有内容,添加一个 Items 标头,并将其打印到一个单独的文件中.如果你有一个更复杂的根,这种方法可能需要更多的摆弄,但如果你这样做,它是适应性强的.

It then iterates all the ones it found, adds an Items header, and prints it to a separate file. This approach might take a little more fiddling if you had a more complex root, but it is adaptable if you do.

这篇关于想要根据标签拆分 UNIX xml 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆