Perl删除重复的XML标签 [英] Perl remove duplicate XML tags
问题描述
我有以下XML文件:
<d:entry id="a" d:title="a">
<d:index d:value="a" d:title="a"/>
<d:index d:value="b" d:title="b"/>
<d:index d:value="a" d:title="a"/>
<d:index d:value="c" d:title="c"/>
<d:index d:value="b" d:title="b"/>
<d:index d:value="a" d:title="a"/>
<d:index d:value="b" d:title="b"/>
<div>This is the content for entry.</div>
</d:entry>
<d:entry id="b" d:title="b">
<d:index d:value="a" d:title="a"/>
<d:index d:value="b" d:title="b"/>
<div>This is the content for entry.</div>
</d:entry>
(添加了空格以提高可读性.)
(Whitespace added for readability.)
有一些<d:index
的重复项,我需要除去所有重复项,并且只保留一个唯一的<d:index
.所需的效果是这样的:
There are some duplicates of <d:index
, I need to get rid of all the duplicates and only keep one unique <d:index
. The desired effect is like this:
<d:entry id="a" d:title="a">
<d:index d:value="a" d:title="a"/>
<d:index d:value="b" d:title="b"/>
<d:index d:value="c" d:title="c"/>
<div>This is the content for entry.</div>
</d:entry>
<d:entry id="b" d:title="b">
<d:index d:value="a" d:title="a"/>
<d:index d:value="b" d:title="b"/>
<div>This is the content for entry.</div>
</d:entry>
我可以为此目的在某些编辑器中进行正则表达式替换,但需要多次执行,我想知道Perl是否有某种方法可以一次运行.
I can do the regex replacement in some editors for that purpose, but it needs to be done multiple times, I was wondering if Perl has some ways to do this in one run.
推荐答案
以下是过滤重复项的常用方法:
The following is a common way to filter out duplicates:
my @filtered = grep { !$seen{$_}++ } @unfiltered;
这可以根据您的需要进行调整,如以下代码片段所示:
This can be adapted to your needs, as shown in the following snippet:
my %seen;
for my $index_node ($xpc->findnodes('d:index', $entry_node)) {
my $value = $xpc->findvalue('@d:value', $index_node);
my $title = $xpc->findvalue('@d:title', $index_node);
if ($seen{$value}{$title}++) {
$index_node->unbind();
}
}
(我使用了我首选的解析器XML :: LibXML,因为您没有提到您使用的是哪个解析器.)
(I used my preferred parser, XML::LibXML, since you didn't mention which parser you were using.)
这篇关于Perl删除重复的XML标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!