将 html 字符串插入到 BeautifulSoup 对象中 [英] Insert html string into BeautifulSoup object

查看:20
本文介绍了将 html 字符串插入到 BeautifulSoup 对象中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将 html 字符串插入到 BeautifulSoup 对象中.如果我直接插入它,bs4 会清理 html.如果获取 html 字符串并从中创建汤,并插入我在使用 find 函数时遇到问题.SO 上的这个帖子线程表明插入 BeautifulSoup 对象可能会导致问题.我正在使用该帖子中的解决方案,并在每次插入时重新制作汤.

但肯定有更好的方法将 html 字符串插入到汤中.

我将添加一些代码作为问题所在的示例

from bs4 import BeautifulSoupmainSoup = BeautifulSoup("""<div class='first'></div><div class='second'></div>""")extraSoup = BeautifulSoup('<span class="first-content"></span>')tag = mainSoup.find(class_='first')tag.insert(1, extraSoup)打印 mainSoup.find(class_='second')# 打印无

解决方案

最简单的方法,如果你已经有了一个 html 字符串,那就插入另一个 BeautifulSoup 对象.

from bs4 import BeautifulSoup文档 = '''<div>测试1

'''汤 = BeautifulSoup(doc, 'html.parser')汤.div.append(BeautifulSoup('<div>insert1</div>', 'html.parser'))打印汤.美化()

输出:

测试1<div>插入1

更新 1

这个怎么样?想法是使用 BeautifulSoup 生成正确的 AST 节点(span 标签).看起来这避免了无"问题.

导入 bs4从 bs4 导入 BeautifulSoupmainSoup = BeautifulSoup("""<div class='first'></div><div class='second'></div>""", 'html.parser')extraSoup = BeautifulSoup('<span class="first-content"></span>', 'html.parser')tag = mainSoup.find(class_='first')tag.insert(1, extraSoup.span)打印 mainSoup.find(class_='second')

输出:

I am trying to insert an html string into a BeautifulSoup object. If I insert it directly, bs4 sanitizes the html. If take the html string and create a soup from it, and insert that I have problems with using the find function. This post thread on SO suggests that inserting BeautifulSoup objects can cause problems. I am using the solution from that post and recreating the soup each time I do an insert.

But surely there's a better way to insert an html string into a soup.

EDIT: I'll add some code as an example of what the problem is

from bs4 import BeautifulSoup

mainSoup = BeautifulSoup("""
<html>
    <div class='first'></div>
    <div class='second'></div>
</html>
""")

extraSoup = BeautifulSoup('<span class="first-content"></span>')

tag = mainSoup.find(class_='first')
tag.insert(1, extraSoup)

print mainSoup.find(class_='second')
# prints None

解决方案

Simplest way, if you already have an html string, is to insert another BeautifulSoup object.

from bs4 import BeautifulSoup

doc = '''
<div>
 test1
</div>
'''

soup = BeautifulSoup(doc, 'html.parser')

soup.div.append(BeautifulSoup('<div>insert1</div>', 'html.parser'))

print soup.prettify()

Output:

<div>
 test1
<div>
 insert1
</div>
</div>

Update 1

How about this? Idea is to use BeautifulSoup to generate the right AST node (span tag). Looks like this avoids the "None" problem.

import bs4
from bs4 import BeautifulSoup

mainSoup = BeautifulSoup("""
<html>
    <div class='first'></div>
    <div class='second'></div>
</html>
""", 'html.parser')

extraSoup = BeautifulSoup('<span class="first-content"></span>', 'html.parser')
tag = mainSoup.find(class_='first')
tag.insert(1, extraSoup.span)

print mainSoup.find(class_='second')

Output:

<div class="second"></div>

这篇关于将 html 字符串插入到 BeautifulSoup 对象中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
Python最新文章
热门教程
热门工具
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆