用Python解析HTML [英] Parsing HTML in Python

查看:90
本文介绍了用Python解析HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我不能使用BeautifulSoup或lxml,那么解析HTML的最佳选择是什么?我有一些使用SGMLlib的代码,但它有点底层,现在已弃用.

What's my best bet for parsing HTML if I can't use BeautifulSoup or lxml? I've got some code that uses SGMLlib but it's a bit low-level and it's now deprecated.

我希望它可以容纳一些格式错误的HTML,尽管我可以确定大部分输入都非常干净.

I would prefer if it could stomache a bit of malformed HTML although I'm pretty sure most of the input will be pretty clean.

推荐答案

Python具有

Python has a native HTML parser, however the Tidy wrapper Nick suggested would probably be a solid choice as well. Tidy is a very common library, (written in C is it?)

这篇关于用Python解析HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆