替换< br>在BeautifulSoap输出中有空格 [英] Replace <br> with space in BeautifulSoap output
本文介绍了替换< br>在BeautifulSoap输出中有空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在使用BeautifulSoap抓取一些链接,但是,它似乎完全忽略了< br>
标签.
I am scraping a few links with BeautifulSoap however, it seems to completely ignore <br>
tags.
这是我要删除的URL的源代码的相关部分:
Here is the relevant portion of source code of the URL I am scrapping:
<h1 class="para-title">A quick brown fox jumps over<br>the lazy dog
<span id="something"></span></h1>
这是我的BeautifulSoap代码(仅相关部分),用于在 h1
标签中获取文本:
Here is my BeautifulSoap code (relevant part only) to get the text within h1
tags:
soup = BeautifulSoup(page, 'html.parser')
title_box = soup.find('h1', attrs={'class': 'para-title'})
title = title_box.text.strip()
print title
这将提供以下输出:
A quick brown fox jumps overthe lazy dog
我希望如此:
A quick brown fox jumps over the lazy dog
如何在代码中用 space
替换< br>
?
推荐答案
如何将 .get_text()
与分隔符参数一起使用?
How about using the .get_text()
with the separator parameter?
from bs4 import BeautifulSoup
page = '''<h1 class="para-title">A quick brown fox jumps over<br>the lazy dog
<span>some stuff here</span></h1>'''
soup = BeautifulSoup(page, 'html.parser')
title_box = soup.find('h1', attrs={'class': 'para-title'})
title = title_box.get_text(separator=" ").strip()
print (title)
输出:
print (title)
A quick brown fox jumps over the lazy dog
some stuff here
这篇关于替换< br>在BeautifulSoap输出中有空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文