如何使用RegEx删除html元素及其内容 [英] How can I remove an html element and it's contents using RegEx
问题描述
我有一个要从看起来像这样的输出中删除的div ID
I have a div id like to remove from an output which looks like
<div id="ithis" class="cthis">Content here which includes other elements etc..) </div>
如何使用PHP和regex删除此div及其中的所有内容?
How can I remove this div and everything within it using PHP and regex?
谢谢.
推荐答案
简单的答案是您没有.您可以使用PHP的许多HTML解析器之一.正则表达式是一种处理HTML的易碎且容易出错的方式.
The simple answer is that you don't. You use one of PHP's many HTML parsers instead. Regexes are a flaky and error-prone way of manipulating HTML.
话虽如此,您可以执行以下操作:
That being said you can do this:
$html = preg_replace('!<div\s+id="ithis"\s+class="cthis">.*?</div>!is', '', $html);
但是很多可能会出错.例如,如果其中包含div:
But many things can wrong with this. For example, if that contains a div:
<div id="ithis" class="cthis">Content here which <div>includes</div> other elements etc..) </div>
您最终会得到:
other elements etc..) </div>
,因为正则表达式将在第一个</div>
处停止.而且,没有什么真正可以做的事情(用正则表达式)一致地解决这个问题.
as the regex will stop at the first </div>
. And no there's nothing you can really do to solve this problem (with regular expressions) consistently.
使用解析器完成后,它看起来更像这样:
Done with a parser it looks more like this:
$doc = new DOMDocument();
$doc->loadHTML($html);
$element = $doc->getElementById('ithis');
$element->parentNode->removeChild($element);
$html = $doc->saveHTML();
这篇关于如何使用RegEx删除html元素及其内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!