将所有html标记拆分为一个数组 [英] Split all html tags into a array

查看:57
本文介绍了将所有html标记拆分为一个数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们假设下面有这段代码:

Let's suppose that I have this code below:

<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Title of the document</title>
</head>    
<body>
<div id="x">Hello</div>
<p>world</p>
<h1>my name</h1>
</body>
</html>

我需要提取所有html标签并将其放入数组中,如下所示:

And I need to extract all html tags and put inside a array, like this:

'0' => '<!DOCTYPE html>',
'1' => '<html>',
'2' => '<head>',
'3' => '<meta charset="UTF-8">',
'4' => '<title>Title of the document</title>',
'5' => '</head>',
'6' => '<body>',
'7' => '<div id="x">Hello</div>',
'8' => '<p>world</p>',
'9' => '<h1>my name</h1>',
....

对于我来说,我不需要将所有现有内容都包含在一个标签中,因为我只捕获每个标签的开头就已经非常好了.

in my case I have no need to get all the existing content within a tag , for me only catch the beginning of each tag was already very good.

我该怎么做?

推荐答案

通过 preg_match_all 函数使用以下解决方案:

Use the following solution with preg_match_all function:

$html_content = '<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Title of the document</title>
</head>    
<body>
<div id="x">Hello</div>
<p>world</p>
<h1>my name</h1>
</body>
</html>';

preg_match_all("/\<\w[^<>]*?\>([^<>]+?\<\/\w+?\>)?|\<\/\w+?\>/i", $html_content, $matches);
// <!DOCTYPE html> is standardized document type definition and is not a tag

print_r($matches[0]);

输出:

Array
(
    [0] => <html>
    [1] => <head>
    [2] => <meta charset="UTF-8">
    [3] => <title>Title of the document</title>
    [4] => </head>
    [5] => <body>
    [6] => <div id="x">Hello</div>
    [7] => <p>world</p>
    [8] => <h1>my name</h1>
    [9] => </body>
    [10] => </html>
)

这篇关于将所有html标记拆分为一个数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆