想要根据标签拆分 UNIX xml 文件 [英] Want to split an UNIX xml file based on tags
问题描述
我有一个包含如下批次的 XML 文件.
I've a XML file with batches like below.
我想使用 shell 脚本根据标签将此文件拆分为 5 个文件.请帮忙,提前致谢.
I want to split this file into 5 files based on the tags using shell scripting. Please help, thanks in advance.
<Items>
<Item>
<Title>Title 1</Title>
<DueDate>01-02-2008</DueDate>
</Item>
<Item>
<Title>Title 2</Title>
<DueDate>01-02-2009</DueDate>
</Item>
<Item>
<Title>Title 3</Title>
<DueDate>01-02-2010</DueDate>
</Item>
<Item>
<Title>Title 4</Title>
<DueDate>01-02-2011</DueDate>
</Item>
<Item>
<Title>Title 5</Title>
<DueDate>01-02-2012</DueDate>
</Item>
</Items>
所需的输出:
<Items>
<Item>
<Title>Title 1</Title>
<DueDate>01-02-2008</DueDate>
</Item>
</Items>
推荐答案
我建议 - 安装 XML::Twig
包括相当方便的 xml_split
实用程序.这可能会做你需要的.例如:
I would suggest - install XML::Twig
which includes the rather handy xml_split
utility. That may do what you need. E.g.:
xml_split -c Item
但是,我认为您要完成的任务并不容易,因为您要拆分并保留 XML 结构.您不能使用基于标准行/正则表达式的工具来做到这一点.
However I'd offer what you're trying to accomplish isn't amazingly easy, because you're trying to cut up and retain the XML structure. You can't do it with standard line/regex based tools.
但是你可以使用解析器:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
my @item_list;
sub cut_item {
my ( $twig, $item ) = @_;
my $thing = $item->cut;
push( @item_list, $thing );
}
my $twig = XML::Twig->new(
twig_handlers => { 'Item' => \&cut_item }
);
$twig->parse(<>);
my $itemcount = 1;
foreach my $element (@item_list) {
my $newdoc = XML::Twig->new( 'pretty_print' => 'indented_a' );
$newdoc->set_root( XML::Twig::Elt->new('Items') );
$element->paste( $newdoc->root );
$newdoc->print;
open( my $output, ">", "items_" . $itemcount++ . ".xml" );
print {$output} $newdoc->sprint;
close($output);
}
这使用 XML::Twig
库从您的 XML 中提取每个 Item
元素(通过 STDIN 或通过 myscript.pl yourfilename
).
This uses the XML::Twig
library to extract each of the Item
elements from your XML (piped on STDIN, or via myscript.pl yourfilename
).
然后迭代它找到的所有内容,添加一个 Items
标头,并将其打印到一个单独的文件中.如果你有一个更复杂的根,这种方法可能需要更多的摆弄,但如果你这样做,它是适应性强的.
It then iterates all the ones it found, adds an Items
header, and prints it to a separate file. This approach might take a little more fiddling if you had a more complex root, but it is adaptable if you do.
这篇关于想要根据标签拆分 UNIX xml 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!