如何使用PHP通过类名或ID获取innerhtml [英] how to get innerhtml by classname or id using php

查看:122
本文介绍了如何使用PHP通过类名或ID获取innerhtml的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从外部URL加载内容.像这样的东西.

Hi i am loading content from external url. something like this.

$html=get_data($external_url);

其中get_data()是用于使用curl获取内容的函数.

where get_data() is a function for getting content using curl.

现在,我想通过使用它们的类或id从不同的html元素(如h1,div,p,span)中获取内部html.

now after this , i want to get the inner html from different html elements like h1,div,p,span by using their class or id.

例如 如果来自外部url($ html)的内容是这样的.

for example if the content from external url($html) is something like this.

<html>
<title></title>
<body>
    <h1 class="title">I am title</h1>
    <div id="content">
        i am the content.
    </div>
</body>

现在我想获取class ="title"的html标签的内部html.类似地,我想获取带有id ="content"

now i want to get the inner html of a html tag with class="title". similarly i want to get inner html of a tag with id="content"

如何使用php做到这一点?我不了解DOM,XML.请帮忙.

How to do this using php? i have no knowledge about DOM, XML. please help.

推荐答案

这是函数DOMDocument::saveHTML().在当前的php版本中,这可能需要将您要另存为html的节点.要保存节点的内部html,必须保存每个子节点.

Here is a function DOMDocument::saveHTML(). In the current php versions, this can take a node you want to save as html. To save the inner html of a node, you have to save each child node.

function getHtml($nodes) {
  $result = '';
  foreach ($nodes as $node) {
    $result .= $node->ownerDocument->saveHtml($node);
  }
  return $result;
}

要获取节点,可以使用Xpath. id很简单.

To fetch the nodes, you can use Xpath. The id is easy.

获取所有元素节点:

//*

具有id属性"content"的

that have the id attribute "content"

//*[@id="content"]

仅使用第一个找到的节点,以防有人多次添加相同的ID.

Use only the first found node, in case somebody added the same id multiple times.

//*[@id="content"][1]

获取子节点-node()包含元素,文本和其他几个节点

Get the child nodes - node() includes element, text and several other nodes

//*[@id="content"][1]/node()

$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);

echo getHtml($xpath->evaluate('//*[@id="content"][1]/node()'));

class属性稍微复杂一些.类属性是令牌列表,它们可以包含多个类名称.这是匹配它们的一个技巧. Xpath函数normalize-space()将所有空白空间组转换为单个空格分隔符.在开头和结尾处添加一个空格,您将得到一个类似于" one two three "的字符串.现在,您可以检查" one "是否是该字符串的一部分.在Xpath中:

The class attribute is a little more complex. Class attributes are token lists, they can contain several class names. Here is a trick to matching them. The Xpath function normalize-space() converts all groups of whitespaces into single space separators. Add a space in front and to the end and you get a string like " one two three ". Now you can check if " one " is a part of that string. In Xpath:

标准化class属性:

Normalize the class attribute:

normalize-space(@class)

添加空格以开始和结束:

Add spaces to start and end:

concat(" ", normalize-space(@class), " ")

检查它是否包含子字符串

Check if it contains the substring

contains(concat(" ", normalize-space(@class), " "), " title ")

使用它来限制节点

//*[contains(concat(" ", normalize-space(@class), " "), " title ")][1]/node()

放在一起:

$html = <<<'HTML'
<html>
<title></title>
<body>
    <h1 class="title">I am title</h1>
    <div id="content">
        i am the <b>content</b>.
    </div>
</body>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);

function getHtml($nodes) {
  $result = '';
  foreach ($nodes as $node) {
    $result .= $node->ownerDocument->saveHtml($node);
  }
  return $result;
}

// first node with the id
var_dump(
  getHtml(
    $xpath->evaluate('//*[@id="content"][1]/node()')
  )
);

// first node with the class
var_dump(
  getHtml(
    $xpath->evaluate(
      '//*[contains(concat(" ", normalize-space(@class), " "), " title ")][1]/node()'
    )
  )
);

// alternative - handling multiple nodes with the same class in a loop
$nodes = $xpath->evaluate(
  '//*[contains(concat(" ", normalize-space(@class), " "), " title ")]'
);
foreach ($nodes as $node) {
  var_dump(getHtml($xpath->evaluate('node()', $node)));
}

输出: https://eval.in/118248

string(40) "
        i am the <b>content</b>.
    "
string(10) "I am title"
string(10) "I am title"

这篇关于如何使用PHP通过类名或ID获取innerhtml的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆