正则表达式用于选择性剥离HTML [英] Regex for selective stripping of HTML

查看：77 发布时间：2018/6/23 15:49:49 php html regex

本文介绍了正则表达式用于选择性剥离HTML的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图用PHP解析一些HTML作为练习，将它作为文本输出，并且遇到了一些障碍。我想删除隐藏在 style =display：none;中的任何标签 - 记住标签可能包含其他属性和样式属性。 p>

我到目前为止的代码是这样的：

  $ page = preg_replace ？＃≤（[AZ] +）*风格= \。？（*显示：？\s *无[^>] *> * LT; / \1>＃分别 ，，$ page）;`

返回的代码 NULL 与一个 PREG_BACKTRACK_LIMIT_ERROR 。

我试过这个：

  $ page = preg_replace（＃<（[az] +）[^>] *？style = \[^ \] *？display：\s * none [^>] *>。*？< / \ 1> #s，，$ page）;

但现在它只是不会取代任何标签。

任何帮助都将非常感谢。
解决方案

使用 DOMDocument ，你可以尝试类似的东西这：

  $ doc = new DOMDocument; 
 $ doc-> loadHTMLFile（foo.html）; 
 $ nodeList = $ doc-> getElementsByTagName（'*'）; 
 foreach（$ nodeList as $ node）{
 if（strpos（strtolower（$ node-> getAttribute（'style'）），'display：none'）！== false）{
 $ doc-> removeChild（$ node）; 
} 
} 
 $ doc-> saveHTMLFile（foo.html）;

I'm trying to parse some HTML with PHP as an exercise, outputting it as just text, and I've hit a snag. I'd like to remove any tags that are hidden with style="display: none;" - bearing in mind that the tag may contain other attributes and style properties.

The code I have so far is this:

$page = preg_replace("#<([a-z]+).*?style=\".*?display:\s*none[^>]*>.*?</\1>#s","",$page);`

The code it returning NULL with a PREG_BACKTRACK_LIMIT_ERROR.
I tried this instead:

$page = preg_replace("#<([a-z]+)[^>]*?style=\"[^\"]*?display:\s*none[^>]*>.*?</\1>#s","",$page);

But now it's just not replacing any tags.

Any help would be much appreciated. Thanks!

解决方案

Using DOMDocument, you can try something like this:

$doc = new DOMDocument;
$doc->loadHTMLFile("foo.html");
$nodeList = $doc->getElementsByTagName('*');
foreach($nodeList as $node) {
    if(strpos(strtolower($node->getAttribute('style')), 'display: none') !== false) {
        $doc->removeChild($node);
    }
}
$doc->saveHTMLFile("foo.html");

这篇关于正则表达式用于选择性剥离HTML的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

正则表达式用于选择性剥离HTML [英] Regex for selective stripping of HTML

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

正则表达式用于选择性剥离HTML [英] Regex for selective stripping of HTML

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭