在PHP中解析无效HTML的最佳方法 [英] Best way to parse an invalid HTML in PHP
问题描述
有没有更好的方法来解析一个无效的HTML然后在它上面应用Tidy?
注意:
当你不能有Tidy可用。
Regexp也不推荐我理解为解析html。
我会尝试如下所示: http://php.net/manual/en/domdocument.loadhtml.php
从该页面:
该函数解析字符串source中包含的HTML。与加载XML不同, HTML不必格式化才能加载。这个函数也可以静态调用来加载和创建一个DOMDocument对象。
Is there a better approach to parse an invalid HTML then applying Tidy on it?
Side Note : There are some situation when you can't have Tidy available. Regexp is also not recommended I understood for parsing html.
I would try something like this: http://php.net/manual/en/domdocument.loadhtml.php
From that page:
The function parses the HTML contained in the string source. Unlike loading XML, HTML does not have to be well-formed to load. This function may also be called statically to load and create a DOMDocument object.
这篇关于在PHP中解析无效HTML的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!