使用BeautifulSoup将HTML插入元素 [英] Insert HTML into an element with BeautifulSoup

查看:227
本文介绍了使用BeautifulSoup将HTML插入元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我尝试将以下HTML插入元素时

When I try to insert the following HTML into an element

<div class="frontpageclass"><h3 id="feature_title">The Title</h3>... </div>

bs4像这样替换它:

<div class="frontpageclass">&lt;h3 id="feature_title"&gt;The Title &lt;/h3&gt;... &lt;div&gt;</div>

我正在使用string,它仍然弄乱了格式.

I am using string and it is still messing up the format.

with open(html_frontpage) as fp:
   soup = BeautifulSoup(fp,"html.parser")

found_data = soup.find(class_= 'front-page__feature-image')
found_data.string = databasedata

如果我尝试使用found_data.string.replace_with,则会收到NoneType错误. found_data是标签类型.

If I try to use found_data.string.replace_with I get a NoneType error. found_data is of type tag.

类似问题,但他们使用的是div,而不是类

推荐答案

设置元素.text.string会使该值经过HTML编码,这是正确的做法.这样可以确保在浏览器中显示文档时,您插入的文本将以1:1的形式显示.

Setting the element .text or .string causes the value to be HTML-encoded, which is the right thing to do. It ensures that the text you insert will appear 1:1 when the document is displayed in a browser.

如果要插入 actual HTML,则需要在树中插入新节点.

If you want to insert actual HTML, you need to insert new nodes into the tree.

from bs4 import BeautifulSoup

# always define a file encoding when working with text files
with open(html_frontpage, encoding='utf8') as fp:
    soup = BeautifulSoup(fp, "html.parser")

target = soup.find(class_= 'front-page__feature-image')

# empty out the target element if needed
target.clear()

# create a temporary document from your HTML
content = '<div class="frontpageclass"><h3 id="feature_title">The Title</h3>...</div>'
temp = BeautifulSoup(content)

# the nodes we want to insert are children of the <body> in `temp`
nodes_to_insert = temp.find('body').children

# insert them, in source order
for i, node in enumerate(nodes_to_insert):
    target.insert(i, node)

这篇关于使用BeautifulSoup将HTML插入元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆