将html字符串插入BeautifulSoup对象 [英] Insert html string into BeautifulSoup object

查看:161
本文介绍了将html字符串插入BeautifulSoup对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图将html字符串插入BeautifulSoup对象.如果我直接插入它,bs4会清理html.如果采用html字符串并从中创建汤,并插入我在使用find函数时遇到问题.关于SO的此帖子主题提示,插入BeautifulSoup对象可能会引起问题.我正在使用该帖子中的解决方案,并在每次插入时重新创建汤.

I am trying to insert an html string into a BeautifulSoup object. If I insert it directly, bs4 sanitizes the html. If take the html string and create a soup from it, and insert that I have problems with using the find function. This post thread on SO suggests that inserting BeautifulSoup objects can cause problems. I am using the solution from that post and recreating the soup each time I do an insert.

但是,肯定有更好的方法将html字符串插入汤中.

But surely there's a better way to insert an html string into a soup.

我将添加一些代码作为问题所在的示例

from bs4 import BeautifulSoup

mainSoup = BeautifulSoup("""
<html>
    <div class='first'></div>
    <div class='second'></div>
</html>
""")

extraSoup = BeautifulSoup('<span class="first-content"></span>')

tag = mainSoup.find(class_='first')
tag.insert(1, extraSoup)

print mainSoup.find(class_='second')
# prints None

推荐答案

如果您已经有了html字符串,最简单的方法是插入另一个BeautifulSoup对象.

Simplest way, if you already have an html string, is to insert another BeautifulSoup object.

from bs4 import BeautifulSoup

doc = '''
<div>
 test1
</div>
'''

soup = BeautifulSoup(doc, 'html.parser')

soup.div.append(BeautifulSoup('<div>insert1</div>', 'html.parser'))

print soup.prettify()

输出:

<div>
 test1
<div>
 insert1
</div>
</div>

更新1

这个怎么样?想法是使用BeautifulSoup生成正确的AST节点(span标签).这样看起来可以避免无"问题.

Update 1

How about this? Idea is to use BeautifulSoup to generate the right AST node (span tag). Looks like this avoids the "None" problem.

import bs4
from bs4 import BeautifulSoup

mainSoup = BeautifulSoup("""
<html>
    <div class='first'></div>
    <div class='second'></div>
</html>
""", 'html.parser')

extraSoup = BeautifulSoup('<span class="first-content"></span>', 'html.parser')
tag = mainSoup.find(class_='first')
tag.insert(1, extraSoup.span)

print mainSoup.find(class_='second')

输出:

<div class="second"></div>

这篇关于将html字符串插入BeautifulSoup对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆