将 html 字符串插入到 BeautifulSoup 对象中 [英] Insert html string into BeautifulSoup object
问题描述
我正在尝试将 html 字符串插入到 BeautifulSoup 对象中.如果我直接插入它,bs4 会清理 html.如果获取 html 字符串并从中创建汤,并插入我在使用 find
函数时遇到问题.SO 上的这个帖子线程表明插入 BeautifulSoup 对象可能会导致问题.我正在使用该帖子中的解决方案,并在每次插入时重新制作汤.
但肯定有更好的方法将 html 字符串插入到汤中.
我将添加一些代码作为问题所在的示例
from bs4 import BeautifulSoupmainSoup = BeautifulSoup("""<div class='first'></div><div class='second'></div>""")extraSoup = BeautifulSoup('<span class="first-content"></span>')tag = mainSoup.find(class_='first')tag.insert(1, extraSoup)打印 mainSoup.find(class_='second')# 打印无
最简单的方法,如果你已经有了一个 html 字符串,那就插入另一个 BeautifulSoup 对象.
from bs4 import BeautifulSoup文档 = '''<div>测试1
'''汤 = BeautifulSoup(doc, 'html.parser')汤.div.append(BeautifulSoup('<div>insert1</div>', 'html.parser'))打印汤.美化()
输出:
测试1<div>插入1
更新 1
这个怎么样?想法是使用 BeautifulSoup 生成正确的 AST 节点(span 标签).看起来这避免了无"问题.
导入 bs4从 bs4 导入 BeautifulSoupmainSoup = BeautifulSoup("""<div class='first'></div><div class='second'></div>""", 'html.parser')extraSoup = BeautifulSoup('<span class="first-content"></span>', 'html.parser')tag = mainSoup.find(class_='first')tag.insert(1, extraSoup.span)打印 mainSoup.find(class_='second')
输出:
I am trying to insert an html string into a BeautifulSoup object. If I insert it directly, bs4 sanitizes the html. If take the html string and create a soup from it, and insert that I have problems with using the find
function. This post thread on SO suggests that inserting BeautifulSoup objects can cause problems. I am using the solution from that post and recreating the soup each time I do an insert.
But surely there's a better way to insert an html string into a soup.
EDIT: I'll add some code as an example of what the problem is
from bs4 import BeautifulSoup
mainSoup = BeautifulSoup("""
<html>
<div class='first'></div>
<div class='second'></div>
</html>
""")
extraSoup = BeautifulSoup('<span class="first-content"></span>')
tag = mainSoup.find(class_='first')
tag.insert(1, extraSoup)
print mainSoup.find(class_='second')
# prints None
Simplest way, if you already have an html string, is to insert another BeautifulSoup object.
from bs4 import BeautifulSoup
doc = '''
<div>
test1
</div>
'''
soup = BeautifulSoup(doc, 'html.parser')
soup.div.append(BeautifulSoup('<div>insert1</div>', 'html.parser'))
print soup.prettify()
Output:
<div>
test1
<div>
insert1
</div>
</div>
Update 1
How about this? Idea is to use BeautifulSoup to generate the right AST node (span tag). Looks like this avoids the "None" problem.
import bs4
from bs4 import BeautifulSoup
mainSoup = BeautifulSoup("""
<html>
<div class='first'></div>
<div class='second'></div>
</html>
""", 'html.parser')
extraSoup = BeautifulSoup('<span class="first-content"></span>', 'html.parser')
tag = mainSoup.find(class_='first')
tag.insert(1, extraSoup.span)
print mainSoup.find(class_='second')
Output:
<div class="second"></div>
这篇关于将 html 字符串插入到 BeautifulSoup 对象中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!