删除父元素,并使用saveHTML将所有内部子元素保留在DOMDocument中 [英] Remove parent element, keep all inner children in DOMDocument with saveHTML

查看:87
本文介绍了删除父元素,并使用saveHTML将所有内部子元素保留在DOMDocument中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用XPath处理一个简短的HTML代码段;当我使用$ doc-> saveHTML()将更改后的代码片段输出回来时,添加了DOCTYPE,并且HTML / BODY标记包装了输出.我想删除这些子项,但仅通过使用DOMDocument函数将所有子项保留在其中.例如:

I'm manipulating a short HTML snippet with XPath; when I output the changed snippet back with $doc->saveHTML(), DOCTYPE gets added, and HTML / BODY tags wrap the output. I want to remove those, but keep all the children inside by only using the DOMDocument functions. For example:

$doc = new DOMDocument();
$doc->loadHTML('<p><strong>Title...</strong></p>
<a href="http://www....."><img src="http://" alt=""></a>
<p>...to be one of those crowning achievements...</p>');
// manipulation goes here
echo htmlentities( $doc->saveHTML() );

这将产生:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" ...>
<html><body>
<p><strong>Title...</strong></p>
<a href="http://www....."><img src="http://" alt=""></a>
<p>...to be one of those crowning achievements...</p>
</body></html>

我尝试了一些简单的技巧,例如:

I've attempted some of the simple tricks, such as:

# removes doctype
$doc->removeChild($doc->firstChild);

# <body> replaces <html>
$doc->replaceChild($doc->firstChild->firstChild, $doc->firstChild); 

到目前为止,仅删除DOCTYPE并将HTML替换为BODY.但是,此时剩下的是body>可变数量的元素.

So far that only removes DOCTYPE and replaces HTML with BODY. However, what remains is body > variable number of elements at this point.

我如何删除<body>标记,但保留其所有子元素所有,因为它们将通过PHP的DOM操作以一种整洁的方式可变地构造?

How do I remove the <body> tag but keep all of its children, given that they will be structured variably, in a neat - clean way with PHP's DOM manipulation?

推荐答案

UPDATE

这是一个不扩展DOMDocument的版本,尽管我认为扩展是正确的方法,因为您正在尝试实现DOM API内置的功能.

UPDATE

Here's a version that doesn't extend DOMDocument, though I think extending is the proper approach, since you're trying to achieve functionality that isn't built-in to the DOM API.

注意:我将干净"和没有变通办法"解释为保留对DOM API的所有操作.一旦您进行了字符串操作,就可以解决此问题.

我正在做的事情,就像原始答案一样,是利用DOMDocumentFragment来操纵全部位于根级别的多个节点.没有进行任何字符串操作,在我看来,这不是解决方法.

What I'm doing, just as in the original answer, is leveraging DOMDocumentFragment to manipulate multiple nodes all sitting at the root level. There is no string manipulation going on, which to me qualifies as not being a workaround.

$doc = new DOMDocument();
$doc->loadHTML('<p><strong>Title...</strong></p><a href="http://www....."><img src="http://" alt=""></a><p>...to be one of those crowning achievements...</p>');

// Remove doctype node
$doc->doctype->parentNode->removeChild($doc->doctype);

// Remove html element, preserving child nodes
$html = $doc->getElementsByTagName("html")->item(0);
$fragment = $doc->createDocumentFragment();
while ($html->childNodes->length > 0) {
    $fragment->appendChild($html->childNodes->item(0));
}
$html->parentNode->replaceChild($fragment, $html);

// Remove body element, preserving child nodes
$body = $doc->getElementsByTagName("body")->item(0);
$fragment = $doc->createDocumentFragment();
while ($body->childNodes->length > 0) {
    $fragment->appendChild($body->childNodes->item(0));
}
$body->parentNode->replaceChild($fragment, $body);

// Output results
echo htmlentities($doc->saveHTML());


原始答案

此解决方案相当冗长,但这是因为它通过扩展DOM来解决此问题,以使您的最终代码尽可能短.


ORIGINAL ANSWER

This solution is rather lengthy, but it's because it goes about it by extending the DOM in order to keep your end code as short as possible.

sliceOutNode是发生魔术的地方.如果您有任何问题,请告诉我:

sliceOutNode is where the magic happens. Let me know if you have any questions:

<?php

class DOMDocumentExtended extends DOMDocument
{
    public function __construct( $version = "1.0", $encoding = "UTF-8" )
    {
        parent::__construct( $version, $encoding );

        $this->registerNodeClass( "DOMElement", "DOMElementExtended" );
    }

    // This method will need to be removed once PHP supports LIBXML_NOXMLDECL
    public function saveXML( DOMNode $node = NULL, $options = 0 )
    {
        $xml = parent::saveXML( $node, $options );

        if( $options & LIBXML_NOXMLDECL )
        {
            $xml = $this->stripXMLDeclaration( $xml );
        }

        return $xml;
    }

    public function stripXMLDeclaration( $xml )
    {
        return preg_replace( "|<\?xml(.+?)\?>[\n\r]?|i", "", $xml );
    }
}

class DOMElementExtended extends DOMElement
{
    public function sliceOutNode()
    {
        $nodeList = new DOMNodeListExtended( $this->childNodes );
        $this->replaceNodeWithNode( $nodeList->toFragment( $this->ownerDocument ) );
    }

    public function replaceNodeWithNode( DOMNode $node )
    {
        return $this->parentNode->replaceChild( $node, $this );
    }
}

class DOMNodeListExtended extends ArrayObject
{
    public function __construct( $mixedNodeList )
    {
        parent::__construct( array() );

        $this->setNodeList( $mixedNodeList );
    }

    private function setNodeList( $mixedNodeList )
    {
        if( $mixedNodeList instanceof DOMNodeList )
        {
            $this->exchangeArray( array() );

            foreach( $mixedNodeList as $node )
            {
                $this->append( $node );
            }
        }
        elseif( is_array( $mixedNodeList ) )
        {
            $this->exchangeArray( $mixedNodeList );
        }
        else
        {
            throw new DOMException( "DOMNodeListExtended only supports a DOMNodeList or array as its constructor parameter." );
        }
    }

    public function toFragment( DOMDocument $contextDocument )
    {
        $fragment = $contextDocument->createDocumentFragment();

        foreach( $this as $node )
        {
            $fragment->appendChild( $contextDocument->importNode( $node, true ) );
        }

        return $fragment;
    }

    // Built-in methods of the original DOMNodeList

    public function item( $index )
    {
        return $this->offsetGet( $index );
    }

    public function __get( $name )
    {
        switch( $name )
        {
            case "length":
                return $this->count();
            break;
        }

        return false;
    }
}

// Load HTML/XML using our fancy DOMDocumentExtended class
$doc = new DOMDocumentExtended();
$doc->loadHTML('<p><strong>Title...</strong></p><a href="http://www....."><img src="http://" alt=""></a><p>...to be one of those crowning achievements...</p>');

// Remove doctype node
$doc->doctype->parentNode->removeChild( $doc->doctype );

// Slice out html node
$html = $doc->getElementsByTagName("html")->item(0);
$html->sliceOutNode();

// Slice out body node
$body = $doc->getElementsByTagName("body")->item(0);
$body->sliceOutNode();

// Pick your poison: XML or HTML output
echo htmlentities( $doc->saveXML( NULL, LIBXML_NOXMLDECL ) );
echo htmlentities( $doc->saveHTML() );

这篇关于删除父元素,并使用saveHTML将所有内部子元素保留在DOMDocument中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆