BeautifulSoup和lxml.html - 什么preFER? [英] BeautifulSoup and lxml.html - what to prefer?

查看:187
本文介绍了BeautifulSoup和lxml.html - 什么preFER?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我工作的一个项目,该项目将涉及解析HTML。

I am working on a project that will involve parsing HTML.

周围搜索后,我发现了两个可能的选择:BeautifulSoup和lxml.html

After searching around, I found two probable options: BeautifulSoup and lxml.html

是否有任何理由preFER一个比其他?我已经使用了XML LXML一段时间回来,我觉得我会更舒服,但BeautifulSoup似乎是很多常见的。

Is there any reason to prefer one over the other? I have used lxml for XML some time back and I feel I will be more comfortable with it, however BeautifulSoup seems to be much common.

我知道我应该用这对我的工作之一,但我一直在寻找既个人的经验。

I know I should use the one that works for me, but I was looking for personal experiences with both.

推荐答案

简单的答案,国际海事组织,是,如果你信任你的源代码进行良好的,去与lxml的解决方案。否则,BeautifulSoup的所有道路。

The simple answer, imo, is that if you trust your source to be well-formed, go with the lxml solution. Otherwise, BeautifulSoup all the way.

编辑:

这答案是现在三岁;这是值得注意的,因为乔纳森Vanasco确实在评论,认为 BeautifulSoup4 现在支持使用 LXML 作为内部解析器,所以您可以使用先进的功能和BeautifulSoup的接口,而大部分的性能损失,如果你想(虽然我仍然达到直 LXML 自己 - 也许是习惯只是力:))。

This answer is three years old now; it's worth noting, as Jonathan Vanasco does in the comments, that BeautifulSoup4 now supports using lxml as the internal parser, so you can use the advanced features and interface of BeautifulSoup without most of the performance hit, if you wish (although I still reach straight for lxml myself -- perhaps it's just force of habit :)).

这篇关于BeautifulSoup和lxml.html - 什么preFER?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆