清除BeautifulSoup分解后变空的行 [英] Remove lines getting empty after BeautifulSoup decompose
本文介绍了清除BeautifulSoup分解后变空的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试使用 BeautifulSoup
从文件中剥离某些HTML标记及其内容.应用 decompose()
后如何删除空行?在此示例中,我希望 a
和 3
之间的线消失,因为这是< span> ...</span>
块是,但最后一行不是.
I am trying to strip certain HTML tags and their content from a file with BeautifulSoup
. How can I remove lines that get empty after applying decompose()
? In this example, I want the line between a
and 3
to be gone, as this is where the <span>...</span>
block was, but not the line in the end.
from bs4 import BeautifulSoup
Rmd_data = 'a\n<span class="answer">\n2\n</span>\n3\n'
print(Rmd_data)
#OUTPUT
# a
# <span class="answer">
# 2
# </span>
# 3
#
# END OUTPUT
soup = BeautifulSoup(Rmd_data, "html.parser")
answers = soup.find_all("span", "answer")
for a in answers:
a.decompose()
Rmd_data = str(soup)
print(Rmd_data)
# OUTPUT
# a
#
# 3
#
# END OUTPUT
推荐答案
我很惊讶BeatifulSoup没有提供prettify()选项.代替手动操作html,您可以重新解析html:
I'm surprised that BeatifulSoup does not offer a prettify() option. Instead of manipulating the html manually you could re-parse your html:
str(BeautifulSoup(str(soup), 'html.parser'))
一如既往地享受.
这篇关于清除BeautifulSoup分解后变空的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文