XSLT 使用非常松散的标准 (EAD) 处理 XML [英] XSLT to process XML with very loose standards (EAD)

查看:25
本文介绍了XSLT 使用非常松散的标准 (EAD) 处理 XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我花了一周的时间试图编写 XSLT 代码来处理符合(非常宽松的)EAD 标准.

I've been having a hell of a week trying to write XSLT code that can process XML documents that conform to the (very permissive) EAD standards.

EAD 文档中的有用信息很难准确定位.不同的EAD 文档可以将相同 位的信息放在数据树的完全不同的部分.此外,在单个 EAD 文档中,同一个标签可以在不同位置多次使用不同信息.有关此示例,请参阅此SO 帖子.这使得设计一个单独的 XSLT 文件来正确处理这些不同的文件变得很困难.

The useful information in an EAD document is hard to locate precisely. Different EAD documents can place the same bit of information in entirely different parts of the data tree. In addition, within a single EAD document, the same tag can be used numerous times in different locations for different information. For an example of this, please see this SO post. This makes it hard to design a single XSLT file that properly handles these different files.

一般来说,问题可以描述为:

In general terms, the problem can be described as:

  • 如何选择位于未知位置的特定 EAD 节点,
  • 不会意外选择具有相同name()的不需要的节点?
  • How do I select a specific EAD node which is in an unknown location,
  • Without accidentally selecting unwanted nodes that have the same name()?

我终于把我需要的 XSLT 放在一起了,我认为最好在此处删除代码的通用版本,以便其他人可以从中受益或对其进行改进.

I've finally put together the XSLT I needed and thought it would be best to drop a generic version of the code here so others can benifit from it or improve upon it.

我很想用EAD"标签来标记这个问题,但我没有足够的代表.如果任何拥有适当数量代表的人认为它会有用,请这样做.

I'd love to tag this question with an "EAD" tag, but I don't have enough rep. If anyone with the appropriate amount of rep thinks it would be useful, please do so.

推荐答案

首先是对解决方案的快速描述,然后是代码.

First a quick description of the solution, followed by the code.

  1. 检查此 EAD 文档是否包含组件(子)记录(用 指定).如果没有,我们不必担心重复的 EAD 标签.标签仍然可以隐藏在任意包装器下.要找到它们,请参阅第 3 步.
  2. 如果存在子记录,请注意在处理其他标签之前不要处理 标签.要查找其他标签,请参阅第 3 步,然后请参阅第 4 步来处理子记录.
  3. 使用与它们匹配的模板递归遍历各种包装器,并在树中更远的任何元素节点上调用 apply-template.
  4. 我们现在正在处理子记录.重复第 2 步(在处理此子记录的子记录之前仔细处理所有其他标签),然后执行第 4 步.
  1. Check if this EAD document contains component (child) records (designated with a <cXX>). If not, we don't have to worry about duplicate EAD tags. The tags can still be burried under arbitrary wrappers. To find them, see step 3.
  2. If child records exist, be careful to not process the <dsc> tag until other tags are processed. To find the other tags, see step 3, then step 4 to process child records.
  3. Recurse through the various wrappers with a template that matches them and calls apply-template on any element node farther down the tree.
  4. We are now processing a child record. Do this by repeating step 2 (carefully process all other tags before tackling the children of this child record), then step 4.

这是我想出的(通用版本)XSLT 代码:

Here's the (generic version of the) XSLT code I came up with:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="ISO-8859-1" indent="yes"/>

<xsl:template match="/ead">
<records>
    <xsl:if test="//dsc">
        <!-- if there are <cXX> nodes, we'll handle the main record differently.
             <cXX> nodes are always found in the 'dsc' node, which contains nothing else -->
        <xsl:call-template name="carefully_process"/>
    </xsl:if>
    <xsl:if test="not(//dsc)">
        <record>
            <!-- Just process the existing nodes -->
            <xsl:apply-templates select="*"/>
        </record>
    </xsl:if>
</records>
</xsl:template>

<xsl:template name="carefully_process">
    <!-- first we'll process all the nodes for the main
         record. Then we'll call the child records -->
    <record>
        <!-- have to be careful not to process //archdesc/dsc yet -->
        <xsl:apply-templates select="*[not(self::archdesc)]"/>
        <xsl:apply-templates select="archdesc/*[not(self::dsc)]"/>

    <!-- Now we can close off the master record, -->
    </record>
    <!-- and process the child records -->
    <xsl:apply-templates select="/ead/archdesc/dsc"/>
</xsl:template>

<xsl:template match="dsc">
    <!-- Start processing the child records (we use for-each to get a good position() -->
    <xsl:for-each select="*[starts-with(name(),'c0') or starts-with(name(),'c1') or name() = 'c']">
        <xsl:apply-templates select=".">
            <!-- we pass the unittitle and unitid of the master record, so that child
                 records can be linked to it. We pass the position of the child so that
                 a unitid can be created if it doesn't exist -->
            <xsl:with-param name="partitle" select="normalize-space(/ead/archdesc/did/unittitle)"/>
            <xsl:with-param name="parid" select="normalize-space(/ead/archdesc/did/unitid)"/>
            <xsl:with-param name="pos" select="position()"/>
        </xsl:apply-templates>
    </xsl:for-each>
</xsl:template>

<!-- process child nodes -->
<xsl:template match="*[starts-with(name(),'c0') or starts-with(name(),'c1') or name() = 'c']" >
<xsl:param name="partitle"/>
<xsl:param name="parid"/>
<xsl:param name="pos"/>
    <!-- start this child record -->
    <record>

        <!-- EAD does not require a unitid, but my code does.
             If it doesn't exist, create it -->
        <xsl:if test="not(./did/unitid)">
            <atom name="unitid">
                <xsl:value-of select="$parid"/><xsl:text>-</xsl:text><xsl:value-of select="$pos"/>
            </atom>
        </xsl:if>

        <!-- get the level of this component -->
        <atom name="eadlevel">
            <xsl:value-of select="concat(translate(substring(@level,1,1),'abcdefghijklmnopqrstuvwxyz','ABCDEFGHIJKLMNOPQRSTUVWXYZ'),substring(@level,2))"/>
        </atom>

        <!-- Do *something* to attach this record to it's parent.
             Probably involves $partitle and $parid. For example: -->
        <ref>
            <atom name="unittitle"><xsl:value-of select="$partitle"/></atom>
            <atom name="unitid"><xsl:value-of select="$parid"/></atom>
        </ref>

        <!-- now process all the other nodes -->
        <xsl:apply-templates select="*[not(starts-with(name(),'c0') or starts-with(name(),'c1') or name() = 'c')]"/>

    <!-- finish this child record -->
    </record>

    <!-- prep the variables we'll need for attaching any child records (<cXX+1>) to this record -->
    <xsl:variable name="this_title">
        <xsl:value-of select="normalize-space(./did/unittitle)"/>
    </xsl:variable> 
    <xsl:variable name="this_id">
        <xsl:if test="./did/unitid">
            <xsl:value-of select="./did/unitid"/>
        </xsl:if>
        <xsl:if test="not(./did/unitid)">
            <xsl:value-of select="$parid"/><xsl:text>-</xsl:text><xsl:value-of select="$pos"/>
        </xsl:if>
    </xsl:variable>

    <!-- now process the children of this node -->
    <xsl:for-each select="*[starts-with(name(),'c0') or starts-with(name(),'c1') or name() = 'c']">
        <xsl:apply-templates select=".">
            <xsl:with-param name="partitle" select="$this_title"/>
            <xsl:with-param name="parid" select="$this_id"/>
            <xsl:with-param name="pos" select="position()"/>
        </xsl:apply-templates>
    </xsl:for-each>
</xsl:template>

<!-- these are usually just wrappers. Go one level deeper -->
<xsl:template match="descgrp|eadheader|revisiondesc|filedesc|titlestmt|profiledesc|archdesc|archdescgrp|daogrp|langusage|did|frontmatter">
    <xsl:apply-templates select="*"/>
</xsl:template>

<!-- below this point, add templates for processing specific EAD units
     of information. For example, the template might look like

<xsl:template match="titleproper">
    <atom name="titleproper">
        <xsl:value-of select="normalize-space(.)"/>
    </atom>
</xsl:template>
-->

<!-- instead of having a template for each EAD information unit, consider
     a generic template that handles them all the same way. For example:
-->
<xsl:template match="*">
    <atom>
        <xsl:attribute name="name"><xsl:value-of select="name()"/></xsl:attribute>
        <xsl:value-of select="normalize-space(.)"/>
    </atom>
</xsl:template>

</xsl:stylesheet>

这篇关于XSLT 使用非常松散的标准 (EAD) 处理 XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆