在python解析HTML - LXML或BeautifulSoup?哪一个是对什么样的目的更好? [英] Parsing HTML in python - lxml or BeautifulSoup? Which of these is better for what kinds of purposes?

查看:994
本文介绍了在python解析HTML - LXML或BeautifulSoup?哪一个是对什么样的目的更好?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我可以做出来,在Python的两个主要HTML解析库是LXML和BeautifulSoup。我选择BeautifulSoup一个项目我的工作,但是我选择了它比找到的语法有点容易学习和了解其他没有特别的理由。但是我看到很多人似乎赞成LXML和我听说lxml的速度更快。

From what I can make out, the two main HTML parsing libraries in Python are lxml and BeautifulSoup. I've chosen BeautifulSoup for a project I'm working on, but I chose it for no particular reason other than finding the syntax a bit easier to learn and understand. But I see a lot of people seem to favour lxml and I've heard that lxml is faster.

所以我不知道是什么了另一种优势?什么时候我会想使用LXML我时会关闭使用BeautifulSoup更好?是否有任何其他图书馆值得考虑的?

So I'm wondering what are the advantages of one over the other? When would I want to use lxml and when would I be better off using BeautifulSoup? Are there any other libraries worth considering?

推荐答案

对于初学者来说,BeautifulSoup不再是积极维护和的笔者甚至建议替代如LXML。

For starters, BeautifulSoup is no longer actively maintained, and the author even recommends alternatives such as lxml.

从链接页面引用:

美丽的汤的版本3.1.0呢
  在现实世界中的HTML显著恶化
  于版本3.0.8一样。最多
  常见问题处理
  标签有误,畸形启动
  标签的错误,和坏结束标记错误。
  本页解释发生了什么事,怎么
  这个问题将得到解决,并且
  你可以做什么现在。

Version 3.1.0 of Beautiful Soup does significantly worse on real-world HTML than version 3.0.8 does. The most common problems are handling tags incorrectly, "malformed start tag" errors, and "bad end tag" errors. This page explains what happened, how the problem will be addressed, and what you can do right now.

本页面原文为
  2009年3月从那时起,3.2系列
  已经发布,取代了3.1
  系列和4.x的发展
  系列已经得到了正在进行中。这一页
  仍将为历史
  用途。

This page was originally written in March 2009. Since then, the 3.2 series has been released, replacing the 3.1 series, and development of the 4.x series has gotten underway. This page will remain up for historical purposes.

TL;博士

使用3.2.0来代替。

Use 3.2.0 instead.

这篇关于在python解析HTML - LXML或BeautifulSoup?哪一个是对什么样的目的更好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆