Xpath查询PHP DOMDocument中XML内的HTML表 [英] Xpath query for HTML table within XML in PHP DOMDocument

查看:55
本文介绍了Xpath查询PHP DOMDocument中XML内的HTML表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有以下树形结构的XML文件.

I have an XML file with following tree structure.

<rss xmlns:dc="http://purl.org/dc/elements/1.1/"  xmlns:media="http://search.yahoo.com/mrss/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
<channel>
    <title>Videos</title>
    <link>https://www.example.com/r/videos/</link>
    <description>A long description of the video.</description>
    <image>...</image>
    <atom:link rel="self" href="http://www.example.com/videos/.xml" type="application/rss+xml"/>
    <item>
        <title>The most used Jazz lick in history.</title>
        <link>
        http://www.example.com/
        </link>
        <guid isPermaLink="true">
         http://www.example.com/
        </guid>
    <pubDate>Mon, 07 Sep 2015 14:43:34 +0000</pubDate>
    <description>
    <table>
        <tr>
            <td>
                <a href="http://www.example.com/">
                    <img src="http://www.example.com/.jpg" alt="The most used Jazz lick in history." title="The most used Jazz lick in history." />
                </a>
            </td>
            <td> submitted by 
                <a href="http://www.example.com/"> jcepiano </a>
                <br/>
                <a href="http://www.youtube.com/">[link]</a>
                <a href="http://www.example.com/">
                    [508 comments]
                </a>
            </td>
        </tr>
    </table>
    </description>
    <media:title>The most used Jazz lick in history.</media:title>
    <media:thumbnail url="http://example.jpg"/>
    </item>
</channel>
</rss>

在这里,html table 元素被嵌入到XML内,这使我感到困惑.

Here, the html table element is embedded inside XML and that's confusing me.

现在,我想选择//channel/item/title 的文本节点值和//channel/item/description/table/tr/td [1]的href值./a [1] (带有文本节点 value ="[link]" )

Now I want to pick the text node values for //channel/item/title and href value for //channel/item/description/table/tr/td[1]/a[1] (with a text node value = "[link]")

在第二种情况下,我正在第二个内寻找第二个 a 的值(带有文本节点 value ="[link]" )在 tr table description item channel 内的td

Above in 2nd case, I am looking for the value of 2nd a (with a text node value = "[link]"), inside 2nd td inside tr, table, description, item, channel.

我正在使用PHP DOMDocument();

我已经为这件事寻找了完美的解决方案已有2天了,您能告诉我这是怎么回事吗?

I have been looking for a perfect solution for this for 2 days now, can you please let me know how would this happen?

我还需要计算提要中的项目总数,现在我正在这样做:

Also I need to count the total number of items in the feed, right now I am doing like this:

...
$queryResult = $xpathvar->query('//item/title');
$total = 1;
foreach($queryResult as $result){
           $total++;
}
echo $title;

我还需要XPath查询选择器规则的参考链接.

And I also need a reference link for XPath query selectors' rules.

提前谢谢!:)

推荐答案

您写道,您想要以下查询的结果集的长度:

You wrote that you wanted the length of the result set of the following query:

$queryResult = $xpathvar->query('//item/title');

我假设这里的 $ xpathvar 类型为 DOMXPath .如果是这样,它具有 length属性,如此处所述.不用使用 foreach ,只需使用:

I assume that $xpathvar here is of type DOMXPath. If so, it has a length property as described here. Instead of using foreach, simply use:

$length = $xpathvar->query('//item/title')->length;

现在我要为//channel/item/title

您可以使用表达式//channel/item/title/text().

//channel/item/description/table/tr/td [1]/a [1]

和href值(带有文本节点 value ="[link]")

您在这里的表达式选择任何 tr ,在其下的第一个 td ,然后选择第一个 a .但是第一个 a 在您的源代码中没有"[link]" 的值.不过,如果需要,您可以使用:

Your expression here selects any tr, the first td under that, then the first a. But the first a does not have a value of "[link]" in your source. If you want that, though, you can use:

//channel/item/description/table/tr/td[1]/a[1]/@href

但是您似乎想要:

//channel/item/description/table/tr/td/a[. = "[link]"][1]/@href

在树中找到第一个 a 元素,该元素的值(文本节点)为"[link]" .

which finds the first a element in the tree that has the value (text node) that is "[link]".

在第二种情况下,我正在第二个内寻找第二个 a 的值(带有文本节点 value ="[link]" )在 tr table description item channel 内的td

Above in 2nd case, I am looking for the value of 2nd a (with a text node value = "[link]"), inside 2nd td inside tr, table, description, item, channel.

不确定这是一个单独的问题还是要解释上一个问题.无论如何,答案与上一个答案相同,除非您明确想要搜索第二个 a 等(即按位置搜索),在这种情况下,您可以使用数字谓词.

Not sure if this was a separate question or meant to explain the previous one. Regardless, the answer the same as in the previous one, unless you explicitly want to search for 2nd a etc (i.e., search by position), in which case you can use numeric predicates.

注意:您大多数的表达式都以//expr 开头,这实际上意味着:在任何深度的整棵树中搜索表达式 expr .这可能很昂贵,并且如果您只需要一个(相对)根节点(您知道该节点的起始点或表达式),则使用直接路径会更好,而且性能更高.就您而言,您可以将//channel 替换为/*/channel (因为它是根元素下的第一个).

Note: you start most of your expressions with //expr, which essentially means: search the whole tree at any depth for the expression expr. This is potentially expensive and if all you need is a (relative) root node for which you know the starting point or expression, it is better, and far more performant, to use a direct path. In your case, you can replace //channel for /*/channel (because it is the first under the root element).

这篇关于Xpath查询PHP DOMDocument中XML内的HTML表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆