不要自动放置html、head和body标签，beautifulsoup [英] Don't put html, head and body tags automatically, beautifulsoup

查看：18 发布时间：2021/12/23 19:51:32 python beautifulsoup html5lib

本文介绍了不要自动放置html、head和body标签，beautifulsoup的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在 html5lib 中使用 beautifulsoup，它会自动放置 html、head 和 body 标签:

using beautifulsoup with html5lib, it puts the html, head and body tags automatically:

BeautifulSoup('<h1>FOO</h1>', 'html5lib') # => <html><head></head><body><h1>FOO</h1></body></html>

有什么我可以设置的选项，关闭这个行为吗?

is there any option that I can set, turn off this behavior ?

推荐答案

In [35]: import bs4 as bs

In [36]: bs.BeautifulSoup('<h1>FOO</h1>', "html.parser")
Out[36]: <h1>FOO</h1>

此使用 Python 的内置 HTML 解析器解析 HTML.引用文档:

与 html5lib 不同，此解析器不会尝试创建格式良好的HTML 文档通过添加标签.与 lxml 不同，它甚至没有麻烦添加一个标签.

Unlike html5lib, this parser makes no attempt to create a well-formed HTML document by adding a <body> tag. Unlike lxml, it doesn’t even bother to add an <html> tag.

<小时>

或者，您可以使用 html5lib 解析器并只选择之后的元素:

Alternatively, you could use the html5lib parser and just select the element after <body>:

In [61]: soup = bs.BeautifulSoup('<h1>FOO</h1>', 'html5lib')

In [62]: soup.body.next
Out[62]: <h1>FOO</h1>

这篇关于不要自动放置html、head和body标签，beautifulsoup的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

不要自动放置html、head和body标签，beautifulsoup [英] Don't put html, head and body tags automatically, beautifulsoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

不要自动放置html、head和body标签，beautifulsoup [英] Don&#39;t put html, head and body tags automatically, beautifulsoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

不要自动放置html、head和body标签，beautifulsoup [英] Don't put html, head and body tags automatically, beautifulsoup

登录关闭