通过XPath解析HTML [英] Parse HTML via XPath
本文介绍了通过XPath解析HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
解决方案
在python中, ElementTidy 解析标签汤并生成一个元素树,该元素树允许使用XPath进行查询:
>>> from elementtidy.TidyHTMLTreeBuilder将TidyHTMLTreeBuilder导入为TB
>>> tb = TB()
>>> tb.feed(< p> Hello world)
>>> e = tb.close()
>>> e.find(.//{http://www.w3.org/1999/xhtml}p)
In .Net, I found this great library, HtmlAgilityPack that allows you to easily parse non-well-formed HTML using XPath. I've used this for a couple years in my .Net sites, but I've had to settle for more painful libraries for my Python, Ruby and other projects. Is anyone aware of similar libraries for other languages?
解决方案
In python, ElementTidy parses tag soup and produces an element tree, which allows querying using XPath:
>>> from elementtidy.TidyHTMLTreeBuilder import TidyHTMLTreeBuilder as TB
>>> tb = TB()
>>> tb.feed("<p>Hello world")
>>> e= tb.close()
>>> e.find(".//{http://www.w3.org/1999/xhtml}p")
<Element {http://www.w3.org/1999/xhtml}p at 264eb8>
这篇关于通过XPath解析HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文