用Python解析HTML [英] Parsing HTML in Python
本文介绍了用Python解析HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
如果我不能使用BeautifulSoup或lxml,那么解析HTML的最佳选择是什么?我有一些使用SGMLlib的代码,但它有点底层,现在已弃用.
What's my best bet for parsing HTML if I can't use BeautifulSoup or lxml? I've got some code that uses SGMLlib but it's a bit low-level and it's now deprecated.
我希望它可以容纳一些格式错误的HTML,尽管我可以确定大部分输入都非常干净.
I would prefer if it could stomache a bit of malformed HTML although I'm pretty sure most of the input will be pretty clean.
推荐答案
Python has a native HTML parser, however the Tidy wrapper Nick suggested would probably be a solid choice as well. Tidy is a very common library, (written in C is it?)
这篇关于用Python解析HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文