在php上需要一个好的HTML解析器 [英] Need a good HTML parser on php
问题描述
找到这一个 http://simplehtmldom.sourceforge.net/ ,但它失败了
解压此页面http://php.net/manual/en/function.curl-setopt.php
和解析为纯html,失败并返回一个部分html页面
这就是我想要做的,
转到一个html页面并获取组件个体(层次结构中所有div和p的内容)
我喜欢simplehtmldom的特性,任何这样的解析器都是必需的,它对所有代码都很好(最好和最差)。
我经常使用 Found this one http://simplehtmldom.sourceforge.net/ but it has failed to work This is what I want to do,
Go to a html page and get the components individual( the contents of all div and p in a hierarchy )
I like the features of simplehtmldom any such parser is required which is good at all code(best and worst). I often use Unfortunatly, I suppose that, in some cases, if the HTML page is really to badly-formed, some parsing problems can occur... That's when you start understanding that respecting web-standards is a great idea... 这篇关于在php上需要一个好的HTML解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋! DOMDocument :: loadHTML
,这在一般情况下效果不错,而且我喜欢查询文档,一旦它们作为DOM加载,使用 Xpath $ c $不幸的是,我想,在某些情况下,如果HTML页面真的非常糟糕,可能会出现一些解析问题。 。那时你开始明白尊重网络标准是一个好主意......
extracting this page http://php.net/manual/en/function.curl-setopt.php
and parse it to plain html, it failed and returned a partial html page
DOMDocument::loadHTML
, which works not too bad, in the general cases -- and I like querying the documents, once they are loaded as DOM, with Xpath
.