PHP DOMDocument剥离HTML标记 [英] PHP DOMDocument stripping HTML tags

查看:94
本文介绍了PHP DOMDocument剥离HTML标记的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用小型模板引擎,并且正在使用DOMDocument来解析页面.到目前为止,我的测试页看起来像这样:

I'm working on a small templating engine, and I'm using DOMDocument to parse the pages. My test page so far looks like this:

<block name="content">

   <?php echo 'this is some rendered PHP! <br />' ?>

   <p>Main column of <span>content</span></p>

</block>

我班的一部分看起来像这样:

And part of my class looks like this:

private function parse($tag, $attr = 'name')
{
    $strict = 0;
    /*** the array to return ***/
    $out = array();
    if($this->totalBlocks() > 0)
    {
        /*** a new dom object ***/
        $dom = new domDocument;
        /*** discard white space ***/
        $dom->preserveWhiteSpace = false;

        /*** load the html into the object ***/
        if($strict==1)
        {
            $dom->loadXML($this->file_contents);
        }
        else
        {
            $dom->loadHTML($this->file_contents);
        }

        /*** the tag by its tag name ***/
        $content = $dom->getElementsByTagname($tag);

        $i = 0;
        foreach ($content as $item)
        {
            /*** add node value to the out array ***/
            $out[$i]['name'] = $item->getAttribute($attr);
            $out[$i]['value'] = $item->nodeValue;
            $i++;
        }
    }

    return $out;
}

我可以按照自己想要的方式工作,因为它可以抓取每个< block>在页面上并将其内容注入到我的模板中,但是,它正在剥离< block>中的HTML标签,因此返回以下内容而没有< p>或< span>标签:

I have it working the way I want in that it grabs each <block> on the page and injects it's contents into my template, however, it is stripping the HTML tags within the <block>, thus returning the following without the <p> or <span> tags:

this is some rendered PHP! Main column of content

我在这里做错了什么? :)谢谢

What am I doing wrong here? :) Thanks

推荐答案

没什么:nodeValue是树的值部分的串联,并且永远不会有标签.

Nothing: nodeValue is the concatenation of the value portion of the tree, and will never have tags.

我要在$ node下制作树的HTML片段的操作是这样的:

What I would do to make an HTML fragment of the tree under $node is this:


$doc = new DOMDocument();
foreach($node->childNodes as $child) {
    $doc->appendChild($doc->importNode($child, true));
}
return $doc->saveHTML();

HTML碎片"实际上比您最初想的要麻烦得多,因为它们往往缺少诸如doctype和字符集之类的东西,这使得很难确定性地在DOM树的各个部分和HTML片段之间来回移动

HTML "fragments" are actually more problematic than you'd think at first, because they tend to lack things like doctypes and character sets, which makes it hard to deterministically go back and forth between portions of a DOM tree and HTML fragments.

这篇关于PHP DOMDocument剥离HTML标记的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆