在 python 中解析 HTML - lxml 或 BeautifulSoup?哪些更适合用于哪些目的? [英] Parsing HTML in python - lxml or BeautifulSoup? Which of these is better for what kinds of purposes?

查看:17
本文介绍了在 python 中解析 HTML - lxml 或 BeautifulSoup?哪些更适合用于哪些目的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我所知,Python 中的两个主要 HTML 解析库是 lxml 和 BeautifulSoup.我为我正在从事的项目选择了 BeautifulSoup,但我选择它并没有特别的原因,只是发现语法更易于学习和理解.但是我看到很多人似乎更喜欢 lxml,而且我听说 lxml 更快.

From what I can make out, the two main HTML parsing libraries in Python are lxml and BeautifulSoup. I've chosen BeautifulSoup for a project I'm working on, but I chose it for no particular reason other than finding the syntax a bit easier to learn and understand. But I see a lot of people seem to favour lxml and I've heard that lxml is faster.

所以我想知道一个比另一个有什么优势?我什么时候想使用 lxml,什么时候最好使用 BeautifulSoup?还有其他值得考虑的图书馆吗?

So I'm wondering what are the advantages of one over the other? When would I want to use lxml and when would I be better off using BeautifulSoup? Are there any other libraries worth considering?

推荐答案

对于初学者来说,BeautifulSoup 不再积极维护,作者甚至推荐了替代方案,例如lxml.

For starters, BeautifulSoup is no longer actively maintained, and the author even recommends alternatives such as lxml.

引用链接页面:

Beautiful Soup 3.1.0 版本确实如此在现实世界的 HTML 上明显更糟比 3.0.8 版还好.最多常见问题是处理标签错误,格式错误的开始tag"错误和bad end tag"错误.此页面解释了发生了什么,如何问题将得到解决,并且你现在可以做什么.

Version 3.1.0 of Beautiful Soup does significantly worse on real-world HTML than version 3.0.8 does. The most common problems are handling tags incorrectly, "malformed start tag" errors, and "bad end tag" errors. This page explains what happened, how the problem will be addressed, and what you can do right now.

这个页面最初是用2009 年 3 月.此后,3.2 系列已发布,取代 3.1系列,以及 4.x 的开发系列已经开始.这一页将保留历史目的.

This page was originally written in March 2009. Since then, the 3.2 series has been released, replacing the 3.1 series, and development of the 4.x series has gotten underway. This page will remain up for historical purposes.

tl;dr

改用 3.2.0.

这篇关于在 python 中解析 HTML - lxml 或 BeautifulSoup?哪些更适合用于哪些目的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆