为特定祖先的每个实例选择某个名称的第一个后代 [英] Selecting first descendant of a certain name for each instance of a particular ancestor
问题描述
我有一些复杂的 MS-Office XML,看起来就像你在链接中看到的一样,但完整的源代码更长,有很多 p:sld
和 p:notes
文档根的子级.总是以p:sld
, p:notes
, p:sld
, p:notes
的顺序出现href="http://pastie.org/9604783" rel="nofollow">http://pastie.org/9604783
I have some complex MS-Office XML that looks like what you see at the link but the full source is much longer with many p:sld
and p:notes
children of document root. Always appearing in the order p:sld
, p:notes
, p:sld
, p:notes
http://pastie.org/9604783
感谢 JLRishe,我有一些 xsl 可以提取后代 a:t
元素并根据上下文将它们的内容包装在各种标签中.
Thanks to JLRishe, I have some xsl that extracts descendant a:t
elements and wraps their contents in various tags based on context.
那个XSL如下
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main">
<xsl:output method="xml"/>
<xsl:template match="/">
<document>
<xsl:apply-templates select="//a:t"/>
</document>
</xsl:template>
<xsl:template match="a:t">
<xsl:variable name="sldAncestor" select="ancestor::p:sld" />
<xsl:variable name="notesAncestor" select="ancestor::p:notes" />
<xsl:variable name="rAncestorPreLevel"
select="ancestor::a:r/preceding-sibling::*[1]/@lvl" />
<xsl:variable name="wrapperName">
<xsl:choose>
<xsl:when test="$sldAncestor and $rAncestorPreLevel = '1'">
<xsl:text>SlideBullet</xsl:text>
</xsl:when>
<xsl:when test="$sldAncestor and $rAncestorPreLevel = '2'">
<xsl:text>SlideBullet1</xsl:text>
</xsl:when>
<xsl:when test="$sldAncestor and $rAncestorPreLevel = '3'">
<xsl:text>SlideBullet2</xsl:text>
</xsl:when>
<xsl:when test="$notesAncestor and $rAncestorPreLevel = '0'" >
<xsl:text>StudentNotes</xsl:text>
</xsl:when>
<xsl:when test="$notesAncestor and $rAncestorPreLevel = '1'" >
<xsl:text>StudentNotes</xsl:text>
</xsl:when>
<xsl:when test="$notesAncestor and $rAncestorPreLevel = '2'">
<xsl:text>Student_Notes_Bullet</xsl:text>
</xsl:when>
<xsl:when test="$notesAncestor and $rAncestorPreLevel = '3'">
<xsl:text>Student_Notes_Bullet_1</xsl:text>
</xsl:when>
<xsl:otherwise>Body</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<xsl:element name="{$wrapperName}">
<xsl:value-of select="." />
</xsl:element>
</xsl:template>
</xsl:stylesheet>
但我想扩展它能够选择出现在每个 p:sld
内的第一个 a:t
元素并将其包装在标签 中<SlideTitleGhost></SlideTitleGhost>
.
But I want to expand that be able to select the first a:t
element that appears inside of each p:sld
and wrap that in the tags <SlideTitleGhost></SlideTitleGhost>
.
同样,我希望能够选择每个 p.notes
元素中的第一个 a:t
元素并用标签
Similarly I want to be able to select the first a:t
element inside each p.notes
element
and wrap its contents with the tags <PageBreak /><StudentNotes></StudentNotes>
请注意,并非所有 a:t
元素都是兄弟元素.兄弟 a:t
元素是 a:r
元素的子元素,但有多个 a:r
元素从每个 p:notes 继承
或 p:sld
元素.那些 a:r
元素也不能指望是兄弟元素.每个 a:t
元素的 xpath 的最后一部分是 //p:cSld/p:spTree/p:sp/p:txBody/a:p/a:r/a:t
Note that not all a:t
elements are siblings. Sibling a:t
elements are children of a:r
elements but there are multiple a:r
elements descended from each p:notes
or p:sld
element. And those a:r
elements cannot be expected to be siblings either. The last part of the xpath to each a:t
element goes //p:cSld/p:spTree/p:sp/p:txBody/a:p/a:r/a:t
我在 Windows 上使用 Saxon-HE,但可以根据需要切换处理器.
I'm using Saxon-HE on windows but could switch processors if needed.
所需的输出如下所示.
<?xml version="1.0" encoding="UTF-8"?>
<document xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main">
<SlideTitleGhost>header text</SlideTitleGhost>
<Body>body text </Body>
<Body>body text </Body>
<Body>body text </Body>
<SlideBullet>bulleted text</SlideBullet>
<SlideBullet>bulleted text</SlideBullet>
<SlideBullet>bulleted text</SlideBullet>
<SlideBullet1>bulleted2 text</SlideBullet1>
<SlideBullet1>bulleted2 text</SlideBullet1>
<SlideBullet1>bulleted2 text</SlideBullet1>
<SlideBullet1>bulleted2 text</SlideBullet1>
<SlideBullet>bulleted text</SlideBullet>
<SlideBullet>bulleted text</SlideBullet>
<SlideBullet>bulleted text</SlideBullet>
<SlideBullet>bulleted text</SlideBullet>
<Body>body text</Body>
<Body>body text</Body>
<Body>footer text</Body>
<Body>10</Body>
<Body>10</Body>
<PageBreak />
<StudentNotes>notes header text</StudentNotes>
<Body>notes body text</Body>
<StudentNotes>notes body text</StudentNotes>
<StudentNotes>notes table header text</StudentNotes>
<StudentNotes>notes table header text</StudentNotes>
<StudentNotes>notes table body text</StudentNotes>
<StudentNotes>table body text</StudentNotes>
<StudentNotes>notes table body text</StudentNotes>
<StudentNotes>notes table body text</StudentNotes>
<StudentNotes>notes table body text</StudentNotes>
<StudentNotes>notes table body text</StudentNotes>
</document>
推荐答案
我能够使用以下 XSL
I was able to get close enough to a desired outcome (and get rid of the last a:t element under each p:sld) with the following XSL
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main">
<xsl:output method="xml"/>
<xsl:template match="/">
<document>
<xsl:apply-templates select="//a:t"/>
</document>
</xsl:template>
<xsl:template match="a:t">
<xsl:variable name="sldAncestor" select="ancestor::p:sld" />
<xsl:variable name="notesAncestor" select="ancestor::p:notes" />
<xsl:variable name="rAncestorPreLevel" select="ancestor::a:r/preceding-sibling::a:pPr/@lvl" />
<xsl:variable name="SlideTitle" select="ancestor::p:txBody/preceding-sibling::p:nvSpPr/p:nvPr/p:ph/@type" />
<xsl:variable name="wrapperName">
<xsl:choose>
<xsl:when test="$sldAncestor and $rAncestorPreLevel = '1'">
<xsl:text>SlideBullet</xsl:text>
</xsl:when>
<xsl:when test="$sldAncestor and $rAncestorPreLevel = '2'">
<xsl:text>SlideBullet1</xsl:text>
</xsl:when>
<xsl:when test="$sldAncestor and $rAncestorPreLevel = '3'">
<xsl:text>SlideBullet2</xsl:text>
</xsl:when>
<xsl:when test="$sldAncestor and $SlideTitle = 'title'">
<xsl:text>SlideTitleGhost</xsl:text>
</xsl:when>
<xsl:when test="$notesAncestor and not(ancestor::a:r/preceding-sibling::a:pPr/@lvl)">
<xsl:text>StudentNotes</xsl:text>
</xsl:when>
<xsl:when test="$notesAncestor and $rAncestorPreLevel = '1'" >
<xsl:text>StudentNotes</xsl:text>
</xsl:when>
<xsl:when test="$notesAncestor and $rAncestorPreLevel = '2'">
<xsl:text>Student_Notes_Bullet</xsl:text>
</xsl:when>
<xsl:when test="$notesAncestor and $rAncestorPreLevel = '3'">
<xsl:text>Student_Notes_Bullet_1</xsl:text>
</xsl:when>
<xsl:otherwise>SlideTopic</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<xsl:choose>
<xsl:when test="not($notesAncestor and ancestor::a:fld)">
<xsl:element name="{$wrapperName}">
<xsl:value-of select="." />
</xsl:element>
</xsl:when>
<xsl:when test="$notesAncestor and ancestor::a:fld">
<xsl:element name="PageBreak"></xsl:element>
</xsl:when>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
我通过确定对 p:sld
元素(ancestor::p:txBody/preceding-sibling::p:nvSpPr/p:nvPr/p:ph/@type
).添加到底部的第二个 xsl:choose
让我扔掉了每个 p:sld
中的最后一个 a:t
,我没有无论如何都想包含在输出中,因为输出不需要它,并将其用作插入 <pagebreak>
标记的时刻,我确实想要在第一个 a:t
p:notes
的后代.
I did it by identifying the unique condition that was true of each and only each first a:t
descendant element of a p:sld
element (ancestor::p:txBody/preceding-sibling::p:nvSpPr/p:nvPr/p:ph/@type
). The second xsl:choose
added to the bottom let me throw out the last a:t
in each p:sld
, which I didn't want to include in output anyway as it was not needed with the output, and use that as the moment to insert a <pagebreak>
tag which I did want before the first a:t
descendant of a p:notes
.
更新: 事实证明这不是解决方案,因为文档顺序与文本在许多页面上的源 PowerPoint 文档中从上到下显示在页面上的顺序不匹配.在许多情况下,出现在每张幻灯片顶部的标题文本在其他 a:t
元素之后以 doc 顺序显示为 a:t
元素.
Update: it turns out this isn't a solution because document order doesn't match the order the text appears on the page from top to bottom in the source PowerPoint document on many pages. The title text that appears at the top of each slide appears as an a:t
element after other a:t
elements in doc order in many cases.
我正在研究一种解决方案,根据根的孩子是 p:sld
还是 p:notes
来应用两个不同的模板.当上下文是根元素时,将模板应用于 "p:sld|p:notes"
.
I'm working on a solution to apply two different templates based on whether the child of the root is p:sld
or p:notes
. Applying template to "p:sld|p:notes"
when the context is the root element.
如果它选择 p:sld
,则 xslt 查找将被包裹在
中的后代 a:t 的值,将该值存储在一个变量,然后输出 <SlideTitleGhost>
$variable</SlideTitleGhost>
然后为子代 a:t
元素应用模板,如上所述,除了将其内容包含在
中的 a:t 元素被删除.
If it slects p:sld
the xslt looks up the value of the descendant a:t that would get wrapped in <SlideTitleGhost>
, stores that value in a variable, then outputs the <SlideTitleGhost>
$variable</SlideTitleGhost>
followed by applying template for descendant a:t
elements as described above, except that a:t elements which would have their contents wrapped in <SlideTitleGhost>
are dropped.
如果它选择p:notes
,它只会应用a:t
的模板.<PageBreak></PageBreak>
标记 p:notes
的开始,当最后一个 a:t
元素已经被插入正在被丢弃.
If it selects p:notes
it simply applies the template for a:t
. <PageBreak></PageBreak>
that marks the start of a p:notes
is already being inserted when the last a:t
element is being dropped.
目前虽然我得到了空输出.因此,欢迎就我上面描述的内容提出任何建议.
Currently though I'm getting empty output. So any advice on how to what I'm describing above would be welcome.
这篇关于为特定祖先的每个实例选择某个名称的第一个后代的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!