Regex从HTML移除带有样式标签的图像 [英] Regex Remove Images with style tag from Html

查看：107 发布时间：2020/7/2 23:31:23 php regex

本文介绍了Regex从HTML移除带有样式标签的图像的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是Regex的新手，但是我认为这是实现所需操作的最简单方法.基本上，我有一个字符串(在PHP中)，其中包含整个HTML代码负载...我想删除所有具有style = display:none ...

I am new to Regex, however I decided it was the easiest route to what I needed to do. Basically I have a string (in PHP) which contains a whole load of HTML code... I want to remove any tags which have style=display:none...

例如

<img src="" style="display:none" />

<img src="" style="width:11px;display: none" >

等...

到目前为止，我的正则表达式是:

So far my Regex is:

<img.*style=.*display.*:.*none;.* >

但是，这似乎遗留了html的某些内容，并且当在带preg_replace的php中使用时，还会删除下一个元素.

But that seems to leave bits of html behind and also take the next element away when used in php with preg_replace.

推荐答案

Like Michael pointed out, you don't want to use Regex for this purpose. A Regex does not know what an element tag is. <foo> is as meaningful as >foo< unless you teach it the difference. Teaching the difference is incredibly tedious though.

DOM非常方便:

$html = <<< HTML
<img src="" style="display:none" />
<IMG src="" style="width:11px;display: none" >
<img src="" style="width:11px" >
HTML;

以上是我们的(无效)标记.我们像这样将其提供给DOM:

The above is our (invalid) markup. We feed it to DOM like this:

$dom = new DOMDocument();
$dom->loadHtml($html);
$dom->normalizeDocument();

现在，我们在DOM中查询包含样式"属性的所有"IMG"元素，其中样式"属性包含文本显示".我们可以在XPath中查询"display:none"，但是我们的输入标记出现了，中间没有空格:

Now we query the DOM for all "IMG" elements containing a "style" attribute that contains the text "display". We could query for "display: none" in the XPath, but our input markup has occurences with no space inbetween:

$xpath = new DOMXPath($dom);
foreach($xpath->query('//img[contains(@style, "display")]') as $node) {
    $style = str_replace(' ', '', $node->getAttribute('style'));
    if(strpos($style, 'display:none') !== FALSE) {
        $node->parentNode->removeChild($node);
    }
}

我们遍历IMG节点，并从其样式属性内容中删除所有空格.然后，我们检查它是否包含"display:none"，如果是，则从DOM中删除该元素.

We iterate over the IMG nodes and remove all whitespace from their style attribute content. Then we check if it contains "display:none" and if so, remove the element from the DOM.

现在我们只需要保存HTML:

Now we only need to save our HTML:

echo $dom->saveHTML();

给予我们

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><img src="" style="width:11px"></body></html>

螺丝正则表达式！

附录:您可能还对

Addendum: you might also be interested in Parsing XML documents with CSS selectors

这篇关于Regex从HTML移除带有样式标签的图像的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Regex从HTML移除带有样式标签的图像 [英] Regex Remove Images with style tag from Html

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

Regex从HTML移除带有样式标签的图像 [英] Regex Remove Images with style tag from Html

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭