使用PHP从div类中提取所有内容（包括HTML） [英] Extract all content (including HTML) from a div class using PHP

查看：521 发布时间：2017/6/25 1:07:31 php dom extract

本文介绍了使用PHP从div类中提取所有内容（包括HTML）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

示例HTML ...

Example HTML...

<html>
<head></head>
<body>
<table>
<tr>
    <td class="rsheader"><b>Header Content</b></td>
</tr>
<tr>
    <td class="rstext">Some text (Most likely will contain lots of HTML</td>
</tr>
</table>
</body>
</html>

我需要转换一页HTML页面是HTML页面的模板版本，HTML页面由几个框组成，每个框都有一个标题（在上面的代码中称为rsheader）和一些文本（在上述代码中称为rstext ）

I need to convert a page of HTML into a templated version of that HTML page. The HTML page is made up of several boxes, each with a header (refered to in the above code as "rsheader") and some text (refered to in the above code as "rstext").

我正在尝试编写一个PHP脚本来检索HTML页面，也许使用file_get_contents，然后提取rsheader和rstext div中的任何内容，基本上我不知道如何！我尝试过DOM实验，但我不太了解，尽管我设法提取文本，但忽略了任何HTML。

I'm trying to write a PHP script to retrieve the HTML page maybe using file_get_contents and then to extract whatever content is within the rsheader and rstext divs. Basically I don't know how to! I've tried experimenting with DOM but I don't know it too well and although I did manage to extract the text, it ignored any HTML.

我的PHP ...

<?php

$html = '<html>
<head></head>
<body>
<table>
<tr>
    <td class="rsheader"><b>Header Content</b></td>
</tr>
<tr>
    <td class="rstext">Some text (Most likely will contain lots of HTML</td>
</tr>
</table>
</body>
</html>';

$dom = new DomDocument();
$dom->loadHtml($html);
$xpath = new DomXpath($dom);
$div = $xpath->query('//*[@class="rsheader"]')->item(0);
echo $div->textContent;

?>

如果我做一个print_r（$ div）我看到这个...

If I do a print_r($div) I see this...

DOMElement Object
    (
    [tagName] => td
    [schemaTypeInfo] => 
    [nodeName] => td
    [nodeValue] => Header Content
    [nodeType] => 1
    [parentNode] => (object value omitted)
    [childNodes] => (object value omitted)
    [firstChild] => (object value omitted)
    [lastChild] => (object value omitted)
    [previousSibling] => 
    [nextSibling] => (object value omitted)
    [attributes] => (object value omitted)
    [ownerDocument] => (object value omitted)
    [namespaceURI] => 
    [prefix] => 
    [localName] => td
    [baseURI] => 
    [textContent] => Header Content
    )

如您所见，textContent节点中没有HTML标签让我相信我会这样做错误的方式：（

As you can see there are no HTML tags within the textContent node which leaves me to believe I'm going about it the wrong way :(

真的希望有人能够给我一些帮助...

Really hoping someone might be able to give me some help with this...

提前感谢

Paul

推荐答案

X-Path可能比这个任务要多一点，我会尝试使用DOMDocument的 getElementById（）方法，下面的例子是从这篇文章。

注意：更新为使用标签和类名而不是元素ID。 >

NOTE: Updated to use tag and class names instead of element IDs.

function getChildHtml( $node ) 
{
    $innerHtml= '';
    $children = $node->childNodes;

    foreach( $children as $child )
    {
        $innerHtml .= sprintf( '%s%s', $innerHtml, $child->ownerDocument->saveXML( $child ) );
    }

    return $innerHtml;
}

$dom = new DomDocument();
$dom->loadHtml( $html );

// Gather all table cells in the document.
$cells = $dom->getElementsByTagName( 'td' );

// Loop through the collected table cells looking for those of class 'rsheader' or 'rstext'.
foreach( $cells as $cell )
{
    if( $cell->getAttribute( 'class' ) == 'rsheader' )
    {
        $headerHtml = getChildHtml( $cell );
        // Do something with header html.
    }

    if( $cell->getAttribute( 'class' ) == 'rstext' )
    {
        $textHtml = getChildHtml( $cell );
        // Do something with text html.
    }
}

这篇关于使用PHP从div类中提取所有内容（包括HTML）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用PHP从div类中提取所有内容（包括HTML） [英] Extract all content (including HTML) from a div class using PHP

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

使用PHP从div类中提取所有内容（包括HTML） [英] Extract all content (including HTML) from a div class using PHP

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭