清除BeautifulSoup分解后变空的行 [英] Remove lines getting empty after BeautifulSoup decompose

查看:38
本文介绍了清除BeautifulSoup分解后变空的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 BeautifulSoup 从文件中剥离某些HTML标记及其内容.应用 decompose()后如何删除空行?在此示例中,我希望 a 3 之间的线消失,因为这是< span> ...</span>块是,但最后一行不是.

I am trying to strip certain HTML tags and their content from a file with BeautifulSoup. How can I remove lines that get empty after applying decompose()? In this example, I want the line between a and 3 to be gone, as this is where the <span>...</span> block was, but not the line in the end.

from bs4 import BeautifulSoup     

Rmd_data = 'a\n<span class="answer">\n2\n</span>\n3\n'
print(Rmd_data)

#OUTPUT
# a
# <span class="answer">
# 2
# </span>
# 3
# 
# END OUTPUT

soup = BeautifulSoup(Rmd_data, "html.parser")
answers = soup.find_all("span", "answer")
for a in answers:
    a.decompose()

Rmd_data = str(soup)
print(Rmd_data)

# OUTPUT
# a
#
# 3
# 
# END OUTPUT

推荐答案

我很惊讶BeatifulSoup没有提供prettify()选项.代替手动操作html,您可以重新解析html:

I'm surprised that BeatifulSoup does not offer a prettify() option. Instead of manipulating the html manually you could re-parse your html:

str(BeautifulSoup(str(soup), 'html.parser'))

一如既往地享受.

这篇关于清除BeautifulSoup分解后变空的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆