如何一次处理多个 xpath(基于提要结构)或创建我自己的具有相同结构的提要 [英] How to handle multiple xpath at once (based on feed structure) or create my own feeds with the same structure
问题描述
下面的代码已经过测试并且可以正常工作,它会打印具有这种结构的提要的内容.
the code below is tested and working, it prints the contents of a feed that has this structure.
<rss>
<channel>
<item>
<pubDate/>
<title/>
<description/>
<link/>
<author/>
</item>
</channel>
</rss>
我没有成功做的是打印遵循下面这个结构的提要(区别在于
),即使我改变了/feed//entry
的 xpath.你可以在页面源码上看到结构.
What I didn't manage to succesfully do is to print feeds that follow this structure below (the difference is on <feed><entry><published>
) even though I changed the xpath to /feed//entry
.
you can see the structure on the page source.
<feed>
<entry>
<published/>
<title/>
<description/>
<link/>
<author/>
</entry>
</feed>
我不得不说,代码根据它的 pubDate
对所有 item
进行排序.在第二个结构提要中,我想它应该根据其 published
对所有 entry
进行排序.
I have to say that the code sorts all item
based on its pubDate
. In the second structure feed I guess it should sort all entry
based on its published
.
我可能在找不到的 xPath 上犯了一个错误.但是,如果最后我设法正确打印该提要,我该如何修改代码以同时处理不同的结构?
I probably make a mistake on the xPath I can't find. However, if at the end of this I manage to print that feed right, how can I modify the code to handle different structures all at once ?
是否有任何服务允许我基于这些提要创建和托管我自己的提要,以便我将拥有与所有人相同的结构?我希望我说清楚了...谢谢.
Is there any service that allow me to create and host my own feeds based on those feeds, so I will have the same structure to all? I hope I made my self clear... Thank you.
<?php
$feeds = array();
// Get all feed entries
$entries = array();
foreach ($feeds as $feed) {
$xml = simplexml_load_file($feed);
$entries = array_merge($entries, $xml->xpath(''));
}
?>
推荐答案
这个答案的主要贡献是一个解决方案(最后),可以使用无限多种格式,只需指定外部(全局)参数 $postElements
中的所有条目"替代名称和外部(全局)参数 $pub-dateElements
中的所有发布日期"替代名称.
The main contribution of this answer is a solution (at the end) that can be used with infinite number of formats, just specifying all "entry" alternative names in the external (global) parameter $postElements
and all "published-date" alternative names in the external (global) parameter $pub-dateElements
.
除此之外,这里是如何指定选择所有/rss//item
和所有/feed//entry
的XPath表达式元素.
Besides this, here is how to specify an XPath expression that selects all /rss//item
and all /feed//entry
elements.
在只有两种可能的文档格式的简单情况下这(由@Josh Davis 提出)Xpath 表达式正确工作:
In the simple case of just two possible document formats this (as proposed by @Josh Davis) Xpath expression correctly works:
/rss//item | /feed//entry
更通用的 XPath 表达式允许从一组无限数量的文档格式中选择所需元素:
/*[contains($topElements, concat('|',name(),'|'))]
//*[contains($postElements, concat('|',name(),'|'))]
其中变量 $topElements
应该被一个顶部元素的所有可能名称的管道分隔的字符串替换,并且 $postElements
应该被一个管道替换 -条目"元素的所有可能名称的分隔字符串.我们还允许条目"元素在不同的文档格式中处于不同的深度.
where the variable $topElements
should be substituted by a pipe-delimited string of all possible names for a top element, and $postElements
should be substituted by a pipe-delimited string of all possible names for a "entry" element. We also allow the "entry" elements to be at different depths in the different document formats.
特别是,对于这种具体情况,XPath 表达式将是;
In particular, for this concrete case the XPath expression will be;
/*[contains('|feed|rss|', concat('|',name(),'|'))]
//*[contains('|item|entry|', concat('|',name(),'|'))]
本文的其余部分展示了如何完全在 XSLT 中完成所需的完整处理——轻松而优雅.
我.温和介绍
I. A gentle introduction
使用 XSLT 进行此类处理非常简单:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<myFeed>
<xsl:apply-templates/>
</myFeed>
</xsl:template>
<xsl:template match="channel|feed">
<xsl:apply-templates select="*">
<xsl:sort select="pubDate|published" order="descending"/>
</xsl:apply-templates>
</xsl:template>
<xsl:template match="item|entry">
<post>
<xsl:apply-templates mode="identity"/>
</post>
</xsl:template>
<xsl:template match="pubDate|published" mode="identity">
<publicationDate>
<xsl:apply-templates/>
</publicationDate>
</xsl:template>
<xsl:template match="node()|@*" mode="identity">
<xsl:copy>
<xsl:apply-templates select="node()|@*" mode="identity"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
当此转换应用于此 XML 文档时(格式 1):
<rss>
<channel>
<item>
<pubDate>2011-06-05</pubDate>
<title>Title1</title>
<description>Description1</description>
<link>Link1</link>
<author>Author1</author>
</item>
<item>
<pubDate>2011-06-06</pubDate>
<title>Title2</title>
<description>Description2</description>
<link>Link2</link>
<author>Author2</author>
</item>
<item>
<pubDate>2011-06-07</pubDate>
<title>Title3</title>
<description>Description3</description>
<link>Link3</link>
<author>Author3</author>
</item>
</channel>
</rss>
以及当它应用于此等效文档时(格式 2):
<feed>
<entry>
<published>2011-06-05</published>
<title>Title1</title>
<description>Description1</description>
<link>Link1</link>
<author>Author1</author>
</entry>
<entry>
<published>2011-06-06</published>
<title>Title2</title>
<description>Description2</description>
<link>Link2</link>
<author>Author2</author>
</entry>
<entry>
<published>2011-06-07</published>
<title>Title3</title>
<description>Description3</description>
<link>Link3</link>
<author>Author3</author>
</entry>
</feed>
在两种情况下都需要相同的结果,但会产生正确的结果:
<myFeed>
<post>
<publicationDate>2011-06-07</publicationDate>
<title>Title3</title>
<description>Description3</description>
<link>Link3</link>
<author>Author3</author>
</post>
<post>
<publicationDate>2011-06-06</publicationDate>
<title>Title2</title>
<description>Description2</description>
<link>Link2</link>
<author>Author2</author>
</post>
<post>
<publicationDate>2011-06-05</publicationDate>
<title>Title1</title>
<description>Description1</description>
<link>Link1</link>
<author>Author1</author>
</post>
</myFeed>
二.完整的解决方案
这可以推广到参数化解决方案:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:param name="postElements" select=
"'|entry|item|'"/>
<xsl:param name="pub-dateElements" select=
"'|published|pubDate|'"/>
<xsl:template match="node()|@*" name="identity">
<xsl:copy>
<xsl:apply-templates select="node()|@*" mode="identity"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<myFeed>
<xsl:apply-templates select=
"//*[contains($postElements, concat('|',name(),'|'))]">
<xsl:sort order="descending" select=
"*[contains($pub-dateElements, concat('|',name(),'|'))]"/>
</xsl:apply-templates>
</myFeed>
</xsl:template>
<xsl:template match="*">
<xsl:choose>
<xsl:when test=
"contains($postElements, concat('|',name(),'|'))">
<post>
<xsl:apply-templates/>
</post>
</xsl:when>
<xsl:when test=
"contains($pub-dateElements, concat('|',name(),'|'))">
<publicationDate>
<xsl:apply-templates/>
</publicationDate>
</xsl:when>
<xsl:otherwise>
<xsl:call-template name="identity"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
此转换可用于无限多种格式,只需在外部(全局)参数 $postElements
和所有已发布-date"外部(全局)参数中的替代名称$pub-dateElements
.
This transformation can be used with infinite number of formats, just specifying all "entry" alternative names in the external (global) parameter $postElements
and all "published-date" alternative names in the external (global) parameter $pub-dateElements
.
任何人都可以尝试这种转换,以验证当应用于上面的两个 XML 文档时,它再次产生相同的、想要的和正确的结果.
Anyone can try this transformation to verify that when applied on the two XML documents above it again produces the same, wanted and correct result.
这篇关于如何一次处理多个 xpath(基于提要结构)或创建我自己的具有相同结构的提要的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!