DOMDocument:如何获取内部HTML作为以换行符分隔的字符串? [英] DOMDocument : how to get inner HTML as Strings separated by line-breaks?

查看:144
本文介绍了DOMDocument:如何获取内部HTML作为以换行符分隔的字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

<blockquote>
 <p>
   2 1/2 cups sweet cherries, pitted<br>
   1 tablespoon cornstarch <br>
   1/4 cup fine-grain natural cane sugar
 </p>
</blockquote>

我想在'p'标签中获取文本.您会看到有3条不同的行,我想在每行添加一些额外的文本后分别打印它们.这是我的代码块

    $tags = $dom->getElementsByTagName('blockquote');
    foreach($tags as $tag)
    {
        $datas = $tag->getElementsByTagName('p');
        foreach($datas as $data)
        {
            $line = $data->nodeValue;
            echo $line;
        }
    } 

主要问题是$ line在'p'标记内包含全文,包括'br'标记.我如何分隔三行以分别对待它们?

提前谢谢.

解决方案

您可以使用XPath做到这一点.您要做的就是查询文本节点.无需爆炸或类似的东西:

$dom = new DOMDocument;
$dom->loadHtml($html);
$xp = new DOMXPath($dom);
foreach ($xp->query('/html/body/blockquote/p/text()') as $textNode) {
    echo "\n<li>", trim($textNode->textContent);
}

非XPath替代方案是迭代P标记的子代,仅在它们是DOMText节点时才输出:

$dom = new DOMDocument;
$dom->loadHtml($html);
foreach ($dom->getElementsByTagName('p')->item(0)->childNodes as $pChild) {
    if ($pChild->nodeType === XML_TEXT_NODE) {
        echo "\n<li>", trim($pChild->textContent);
    }
}

两者都会输出(演示)

<li>2 1/2 cups sweet cherries, pitted
<li>1 tablespoon cornstarch
<li>1/4 cup fine-grain natural cane sugar

另请参见 PHP中的DOMDocument 以获得有关的解释节点概念.了解使用DOM时至关重要.

<blockquote>
 <p>
   2 1/2 cups sweet cherries, pitted<br>
   1 tablespoon cornstarch <br>
   1/4 cup fine-grain natural cane sugar
 </p>
</blockquote>

hi , i want to get the text inside 'p' tag . you see there are three different line and i want to print them separately after adding some extra text with each line . here is my code block

    $tags = $dom->getElementsByTagName('blockquote');
    foreach($tags as $tag)
    {
        $datas = $tag->getElementsByTagName('p');
        foreach($datas as $data)
        {
            $line = $data->nodeValue;
            echo $line;
        }
    } 

main problem is $line contains the full text inside 'p' tag including 'br' tag . how can i separate the three lines to treat them respectively ??

thanks in advance.

解决方案

You can do that with XPath. All you have to do is query the text nodes. No need to explode or something like that:

$dom = new DOMDocument;
$dom->loadHtml($html);
$xp = new DOMXPath($dom);
foreach ($xp->query('/html/body/blockquote/p/text()') as $textNode) {
    echo "\n<li>", trim($textNode->textContent);
}

The non-XPath alternative would be to iterate the children of the P tag and only output them when they are DOMText nodes:

$dom = new DOMDocument;
$dom->loadHtml($html);
foreach ($dom->getElementsByTagName('p')->item(0)->childNodes as $pChild) {
    if ($pChild->nodeType === XML_TEXT_NODE) {
        echo "\n<li>", trim($pChild->textContent);
    }
}

Both will output (demo)

<li>2 1/2 cups sweet cherries, pitted
<li>1 tablespoon cornstarch
<li>1/4 cup fine-grain natural cane sugar

Also see DOMDocument in php for an explanation of the node concept. It's crucial to understand when working with DOM.

这篇关于DOMDocument:如何获取内部HTML作为以换行符分隔的字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆