BeautifulSoup 和 lxml.html - 更喜欢什么? [英] BeautifulSoup and lxml.html - what to prefer?

查看:18
本文介绍了BeautifulSoup 和 lxml.html - 更喜欢什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做一个涉及解析 HTML 的项目.

I am working on a project that will involve parsing HTML.

搜索之后,我发现了两个可能的选项:BeautifulSoup 和 lxml.html

After searching around, I found two probable options: BeautifulSoup and lxml.html

有什么理由更喜欢一个吗?前段时间我已经将 lxml 用于 XML,我觉得我会更适应它,但是 BeautifulSoup 似乎很常见.

Is there any reason to prefer one over the other? I have used lxml for XML some time back and I feel I will be more comfortable with it, however BeautifulSoup seems to be much common.

我知道我应该使用适合我的那一种,但我一直在寻找两者的个人体验.

I know I should use the one that works for me, but I was looking for personal experiences with both.

推荐答案

imo 的简单答案是,如果您相信源代码格式正确,请使用 lxml 解决方案.否则,BeautifulSoup 一路走来.

The simple answer, imo, is that if you trust your source to be well-formed, go with the lxml solution. Otherwise, BeautifulSoup all the way.

这个答案已经三年了;值得注意的是,正如 Jonathan Vanasco 在评论中所做的那样,BeautifulSoup4 现在支持使用 lxml 作为内部解析器,因此您可以使用 BeautifulSoup 的高级功能和界面而无需大多数如果你愿意的话,性能受到的影响(尽管我自己仍然直接使用 lxml —— 也许这只是习惯的力量:)).

This answer is three years old now; it's worth noting, as Jonathan Vanasco does in the comments, that BeautifulSoup4 now supports using lxml as the internal parser, so you can use the advanced features and interface of BeautifulSoup without most of the performance hit, if you wish (although I still reach straight for lxml myself -- perhaps it's just force of habit :)).

这篇关于BeautifulSoup 和 lxml.html - 更喜欢什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆