使用BeautifulSoup编辑DOCTYPE标记 [英] Editing DOCTYPE tag with BeautifulSoup

查看:43
本文介绍了使用BeautifulSoup编辑DOCTYPE标记的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在html文档的DOCTYPE标记中添加一个ATTLIST声明.

I need to add an ATTLIST declaration to the DOCTYPE tag in html documents.

在阅读文档并进行谷歌搜索之后,这就是我想出的:

After reading the documentation and googling, this is what I've come up with:

from bs4 import BeautifulSoup, Doctype

# minimal html document
doc = """<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" >
<html/>"""

soup = BeautifulSoup(doc, features='html.parser')

# the modified doctype tag
doctype = """<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
[<!ATTLIST span bodyref CDATA #IMPLIED>] >"""

dt = BeautifulSoup(doctype, features='html.parser')

for item in soup.contents:
    if isinstance(item, Doctype):
        item.replace_with(dt)
        break

print(soup.prettify(formatter=None))

这会产生所需的结果,但是感觉有点"hacky".我只想将ATTLIST部分插入标签,而不是像我在这里所做的那样取代整个内容.有人知道怎么做吗?

This produces the desired result, but it feels a bit "hacky". I'd like to just insert the ATTLIST part into the tag, and not replace the whole thing, as I've done here. Does anyone know how to do that?

推荐答案

一个小改进是构建一个 Doctype 对象并替换为该对象,例如:

A small improvement would be to build a Doctype object and replace with that, for example:

from bs4 import BeautifulSoup, Doctype

# minimal html document
doc = """<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" >
<html/>"""

# the modified doctype tag
doctype = """html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
[<!ATTLIST span bodyref CDATA #IMPLIED>]"""

soup = BeautifulSoup(doc, features='html.parser')

for item in soup.contents:
    if isinstance(item, Doctype):
        item.replace_with(Doctype(doctype))
        break

print(soup.prettify(formatter=None))

给予:

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
[<!ATTLIST span bodyref CDATA #IMPLIED>]>
<html>
</html>

这篇关于使用BeautifulSoup编辑DOCTYPE标记的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆