当数据结构未知时排除某些子节点 [英] Exclude certain child nodes when data structure is unknown

查看:24
本文介绍了当数据结构未知时排除某些子节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编辑 -我已经找到了我的问题的解决方案并发布了一个问答 这里.

EDIT - I've figured out the solution to my problem and posted a Q&A here.

我希望处理符合美国国会图书馆 EAD 标准的 XML(在此处找到).不幸的是,该标准对于 XML 的结构非常松散.

I'm looking to process XML conforming to the Library of Congress EAD standard (found here). Unfortunately, the standard is very loose regarding the structure of the XML.

例如 标签可以存在于 标签中,或者存在于 标签中,或嵌套在另一个 标签中,或以上的组合,或者可以完全省略.我发现只选择我正在寻找的 bioghist 标签而不选择其他标签是非常困难的.

For example the <bioghist> tag can exist within the <archdesc> tag, or within a <descgrp> tag, or nested within another <bioghist> tag, or a combination of the above, or can be left out entirely. I've found it to be very difficult to select just the bioghist tag I'm looking for without also selecting others.

以下是我的 XSLT 可能需要处理的几种不同的 EAD XML 文档:

Below are a few different possible EAD XML documents my XSLT might have to process:

第一个例子

<ead>
<eadheader>
    <archdesc>
        <bioghist>one</bioghist>
        <dsc>
            <c01>
                <descgrp>
                    <bioghist>two</bioghist>
                </descgrp>
                <c02>
                    <descgrp>
                        <bioghist>
                            <bioghist>three</bioghist>
                        </bioghist>
                    </descgrp>
                </c02>
            </c01>
        </dsc>
    </archdesc>
</eadheader>
</ead>

第二个例子

<ead>
<eadheader>
    <archdesc>
        <descgrp>
            <bioghist>
                <bioghist>one</bioghist>
            </bioghist>
        </descgrp>
        <dsc>
            <c01>
                <c02>
                    <descgrp>
                        <bioghist>three</bioghist>
                    </descgrp>
                </c02>
                <bioghist>two</bioghist>
            </c01>
        </dsc>
    </archdesc>
</eadheader>
</ead>

第三个​​例子

<ead>
<eadheader>
    <archdesc>
        <descgrp>
            <bioghist>one</bioghist>
        </descgrp>
        <dsc>
            <c01>
                <c02>
                    <bioghist>three</bioghist>
                </c02>
            </c01>
        </dsc>
    </archdesc>
</eadheader>
</ead>

如您所见,EAD XML 文件几乎可以在任何地方都有 标签.我想产生的实际输出太复杂了,无法在这里发布.以上三个 EAD 示例的输出的简化示例可能如下所示:

As you can see, an EAD XML file might have a <bioghist> tag almost anywhere. The actual output I'm suppose to produce is too complicated to post here. A simplified example of the output for the above three EAD examples might be like:

第一个示例的输出

<records>
<primary_record>
    <biography_history>first</biography_history>
</primary_record>
<child_record>
    <biography_history>second</biography_history>
</child_record>
<granchild_record>
    <biography_history>third</biography_history>
</granchild_record>
</records>

第二个例子的输出

<records>
<primary_record>
    <biography_history>first</biography_history>
</primary_record>
<child_record>
    <biography_history>second</biography_history>
</child_record>
<granchild_record>
    <biography_history>third</biography_history>
</granchild_record>
</records>

第三个​​例子的输出

<records>
<primary_record>
    <biography_history>first</biography_history>
</primary_record>
<child_record>
    <biography_history></biography_history>
</child_record>
<granchild_record>
    <biography_history>third</biography_history>
</granchild_record>
</records>

如果我想提取第一个"bioghist 值并将其放入<primary_record>,我不能简单地<xsl:apply-templates select="/ead/eadheader/archdesc/bioghist",因为该标签可能不是 标签的直接后代.它可能由 或它们的组合包裹.而且我不能select="//bioghist",因为这会拉所有 标签.我什至不能 select="//bioghist[1]" 因为那里可能实际上没有 <bioghist> 标签,然后我会拉 下方的值,即Second",应稍后处理.

If I want to pull the "first" bioghist value and put that in the <primary_record>, I can't simply <xsl:apply-templates select="/ead/eadheader/archdesc/bioghist", as that tag might not be a direct descendant of the <archdesc> tag. It might be wrapped by a <descgrp> or a <bioghist> or a combination thereof. And I can't select="//bioghist", because that will pull all the <bioghist> tags. I can't even select="//bioghist[1]" because there might not actually be a <bioghist> tag there and then I'll be pulling the value below the <c01>, which is "Second" and should be processed later.

这已经是一篇很长的文章,但另一个问题是可以有无限数量的 <cxx> 节点,嵌套最多 12 层深.我目前正在递归处理它们.我尝试将当前正在处理的节点(例如 )保存为名为RN"的变量,然后运行 ​​.这适用于某些形式的 EAD,其中 <bioghist> 标签没有嵌套太深,但如果它必须处理由喜欢包装标签的人创建的 EAD 文件,它将失败其他标签(根据 EAD 标准完全没问题).

This is already a long post, but one other wrinkle is that there can be an unlimited number of <cxx> nodes, nested up to twelve levels deep. I'm currently processing them recursively. I've tried saving the node I'm currently processing (<c01> for example) as a variable called 'RN', then running <xsl:apply-templates select=".//bioghist [name(..)=name($RN) or name(../..)=name($RN)]">. This works for some forms of EAD, where the <bioghist> tag isn't nested too deeply, but it will fail if it ever has to process an EAD file created by someone who loves wrapping tags in other tags (which is totally fine according to the EAD Standard).

我想说的是某种方式

  • 获取任何 标记在当前节点下方的任何位置,但
  • 如果您点击了 <c??> 标签,请不要深入挖掘
  • Get any <bioghist> tag anywhere below the current node but
  • don't dig deeper if you hit a <c??> tag

我希望我已经把情况说清楚了.如果我有什么不明确的地方,请告诉我.您能提供的任何帮助将不胜感激.谢谢.

I hope that I've made the situation clear. Please let me know if I've left anything ambiguous. Any assistance you can provide would be greatly appreciated. Thanks.

推荐答案

我自己制定了一个解决方案并将其发布在此 Q&A 因为该解决方案非常特定于某个 XML 标准,似乎超出了这个问题的范围.如果人们认为最好也将其发布在这里,我可以使用副本更新此答案.

I worked out a solution on my own and posted it at this Q&A because the solution is quite specific to a certain XML standard and seemed out of the scope of this question. If people feel it would be best to post it here as well, I can update this answer with a copy.

这篇关于当数据结构未知时排除某些子节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆