实际工作的Python html解析 [英] Python html parsing that actually works

查看：114 发布时间：2018/6/19 19:54:34 python html parsing

本文介绍了实际工作的Python html解析的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图在Python中解析一些html。有些方法实际上在......之前有效，但是现在没有解决方法，我实际上可以使用任何方法。

I'm trying to parse some html in Python. There were some methods that actually worked before... but nowadays there's nothing I can actually use without workarounds.

SGMLParser去之后beautifulsoup有问题

html5lib无法解析out there的一半

lxml试图对典型的html太正确（属性和标签不能包含未知的命名空间，或抛出异常，这意味着几乎没有Facebook连接的页面可以被解析）

还有其他的选择这些天？（如果它们支持xpath，那就太好了）

What other options are there these days? (if they support xpath, that would be great)

推荐答案

确保您使用 html 模块，当您使用 lxml 解析HTML时：

Make sure that you use the html module when you parse HTML with lxml:

>>> from lxml import html >>> doc = """<html> ... <head> ... <title> Meh ... </head> ... <body> ... Look at this interesting use of <p> ... rather than using <br /> tags as line breaks <p> ... </body>""" >>> html.document_fromstring(doc) <Element html at ...>

所有错误&例外情况会消失，你将得到一个惊人的快速解析器，它比BeautifulSoup更经常处理HTML汤。

All the errors & exceptions will melt away, you'll be left with an amazingly fast parser that often deals with HTML soup better than BeautifulSoup.

这篇关于实际工作的Python html解析的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

实际工作的Python html解析 [英] Python html parsing that actually works

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

实际工作的Python html解析 [英] Python html parsing that actually works

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭