python:"lxml"和"html.parser"之间的区别和"html5lib"配上漂亮的汤? [英] python: difference between 'lxml' and "html.parser" and "html5lib" with beautiful soup?

查看:1384
本文介绍了python:"lxml"和"html.parser"之间的区别和"html5lib"配上漂亮的汤?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用漂亮的汤时,'lxml'和"html.parser"和"html5lib"有什么区别?您什么时候可以使用另一种方法以及每种方法的优点?从我每次使用的时间来看,它们似乎是可互换的,但我确实得到纠正,我应该使用与此处的人不同的方法.想加深我对这些的理解.我在这里已经阅读了几篇有关此内容的文章,但它们根本没有涉及太多用途.

When using beautiful soup what is the difference between 'lxml' and "html.parser" and "html5lib"? When would you use one over the other and the benefits of each? from the times i used each they seem to be interchangeable but i do get corrected that i should be using a different one from people on here. Would like to strengthen my understanding of these. I have read a couple posts on here about this but they are not going over the uses much in any at all.

示例-

soup = BeautifulSoup(response.text, 'lxml')

推荐答案

来自 文档 的优缺点汇总表:

From the docs's summarized table of advantages and disadvantages:

  1. html.parser -BeautifulSoup(markup, "html.parser")

  • 优点:包括电池,体面的速度,宽大(自Python 2.7.3和3.2起).

  • Advantages: Batteries included, Decent speed, Lenient (as of Python 2.7.3 and 3.2.)

缺点:不太宽大(在Python 2.7.3或3.2.2之前)

Disadvantages: Not very lenient (before Python 2.7.3 or 3.2.2)

lxml -BeautifulSoup(markup, "lxml")

  • 优点:非常快,宽大

  • Advantages: Very fast, Lenient

缺点:外部C依赖

html5lib -BeautifulSoup(markup, "html5lib")

  • 优点:极为宽松,以与网络浏览器相同的方式解析页面,创建有效的HTML5

  • Advantages: Extremely lenient, Parses pages the same way a web browser does, Creates valid HTML5

缺点:非常慢,外部Python依赖

Disadvantages: Very slow, External Python dependency

这篇关于python:"lxml"和"html.parser"之间的区别和"html5lib"配上漂亮的汤?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆